Using rquery To Speed Up Data Manipulations

Kevin Feasel

2018-01-12

R

John Mount shows off some rquery benchmarks versus dplyr and data.table:

Let’s take a look at rquery’s new “ad hoc” mode (made convenient through wrapr‘s new “wrapr_applicable” feature). This is where rquery works on in-memory data.frame data by sending it to a database, processing on the database, and then pulling the data back. We concede this is a strange way to process data, and not rquery’s primary purpose (the primary purpose being generation of safe high performance SQL for big data engines such as Spark and PostgreSQL). However, our experiments show that it is in fact a competitive technique.

We’ve summarized the results of several experiments (experiment details here) in the following graph (graphing code here). The benchmark task was hand implementing logistic regression scoring. This is an example query we have been using for some time.

There are some nice early results, so it’ll be interesting to watch as this develops.

Related Posts

Biases in Tree-Based Models

Nina Zumel looks at tree-based ensembling models like random forest and gradient boost and shows that they can be biased: In our previous article , we showed that generalized linear models are unbiased, or calibrated: they preserve the conditional expectations and rollups of the training data. A calibrated model is important in many applications, particularly when financial data […]

Read More

R 3.6.1 Available

David Smith notes a new version of R is available: On July 5, the R Core Group released the source code for the latest update to R, R 3.6.1, and binaries are now available to download for Windows, Linux and Mac from your local CRAN mirror. R 3.6.1 is a minor update to R that fixes a few bugs. […]

Read More

Categories

January 2018
MTWTFSS
« Dec Feb »
1234567
891011121314
15161718192021
22232425262728
293031