rqdatatable — Wrangling Lots Of Data, Fast

Kevin Feasel

2018-06-04

R

John Mount explains the motivation behind rqdatatable and puts together a performance test:

rquery is already one of the fastest and most teachable (due to deliberate conformity to Codd’s influential work) tools to wrangle data on databases and big data systems. And now rquery is also one of the fastest methods to wrangle data in-memory in R (thanks to data.table, via a thin adaption supplied by rqdatatable).

Teaching rquery and fully benchmarking it is a big task, so in this note we will limit ourselves to a single example and benchmark. Our intent is to use this example to promote rquery and rqdatatable, but frankly the biggest result of the benchmarking is how far out of the pack data.tableitself stands at small through large problem sizes. This is already known, but it is a much larger difference and at more scales than the typical non-data.table user may be aware of.

Click through for the benchmark and information on how to grab the package before it goes into CRAN.

Related Posts

Deploying An R Service To Azure Kubernetes Service

Hong Ooi shows us how we can use Azure Container Registry and Azure Kubernetes Service to deploy an R model via Plumber: If you run this code, you should see a lot of output indicating that R is downloading, compiling and installing randomForest, and finally that the image is being pushed to Azure. (You will […]

Read More

Road Construction Incentive Contracts And R

Sebastian Kranz promotes an interesting RTutor project: Patrick Bajari and Gregory Lewis have collected a detailed sample of 466 road construction projects in Minnesota to study this question in their very interesting article Moral Hazard, Incentive Contracts and Risk: Evidence from Procurement in the Review of Economic Studies, 2014.They estimate a structural econometric model and find that […]

Read More

Categories

June 2018
MTWTFSS
« May Jul »
 123
45678910
11121314151617
18192021222324
252627282930