Sorting With data.table Versus dplyr

Kevin Feasel

2018-08-14

R

John Mount shows us that data.table is way faster for sorting than dplyr‘s arrange function:

Notice on the above semi-log plot the run time ratio is growing roughly linearly. This makes sense: data.table uses a radix sort which has the potential to perform in near linear time (faster than the n log(n) lower bound known comparison sorting) for a range of problems (also we are only showing example sorting times, not worst-case sorting times).

In fact, if we divide the y in the above graph by log(rows) we get something approaching a constant.

John has also provided us with a markdown document for comparison.

Related Posts

Deploying An R Service To Azure Kubernetes Service

Hong Ooi shows us how we can use Azure Container Registry and Azure Kubernetes Service to deploy an R model via Plumber: If you run this code, you should see a lot of output indicating that R is downloading, compiling and installing randomForest, and finally that the image is being pushed to Azure. (You will […]

Read More

Road Construction Incentive Contracts And R

Sebastian Kranz promotes an interesting RTutor project: Patrick Bajari and Gregory Lewis have collected a detailed sample of 466 road construction projects in Minnesota to study this question in their very interesting article Moral Hazard, Incentive Contracts and Risk: Evidence from Procurement in the Review of Economic Studies, 2014.They estimate a structural econometric model and find that […]

Read More

Categories

August 2018
MTWTFSS
« Jul Sep »
 12345
6789101112
13141516171819
20212223242526
2728293031