rqdatatable — Wrangling Lots Of Data, Fast

Kevin Feasel



John Mount explains the motivation behind rqdatatable and puts together a performance test:

rquery is already one of the fastest and most teachable (due to deliberate conformity to Codd’s influential work) tools to wrangle data on databases and big data systems. And now rquery is also one of the fastest methods to wrangle data in-memory in R (thanks to data.table, via a thin adaption supplied by rqdatatable).

Teaching rquery and fully benchmarking it is a big task, so in this note we will limit ourselves to a single example and benchmark. Our intent is to use this example to promote rquery and rqdatatable, but frankly the biggest result of the benchmarking is how far out of the pack data.tableitself stands at small through large problem sizes. This is already known, but it is a much larger difference and at more scales than the typical non-data.table user may be aware of.

Click through for the benchmark and information on how to grab the package before it goes into CRAN.

Related Posts

Interpreting The Area Under The Receiver Operating Characteristic Curve

Roos Colman explains what a Receiver Operating Characteristic (ROC) curve is and how we interpret the Area Under the Curve (AUC): The AUC can be defined as “The probability that a randomly selected case will have a higher test result than a randomly selected control”. Let’s use this definition to calculate and visualize the estimated […]

Read More

Building A Neural Network In R With Keras

Pablo Casas walks us through Keras on R: One of the key points in Deep Learning is to understand the dimensions of the vector, matrices and/or arrays that the model needs. I found that these are the types supported by Keras. In Python’s words, it is the shape of the array. To do a binary […]

Read More


June 2018
« May Jul »