Using rquery To Speed Up Data Manipulations

Kevin Feasel

2018-01-12

R

John Mount shows off some rquery benchmarks versus dplyr and data.table:

Let’s take a look at rquery’s new “ad hoc” mode (made convenient through wrapr‘s new “wrapr_applicable” feature). This is where rquery works on in-memory data.frame data by sending it to a database, processing on the database, and then pulling the data back. We concede this is a strange way to process data, and not rquery’s primary purpose (the primary purpose being generation of safe high performance SQL for big data engines such as Spark and PostgreSQL). However, our experiments show that it is in fact a competitive technique.

We’ve summarized the results of several experiments (experiment details here) in the following graph (graphing code here). The benchmark task was hand implementing logistic regression scoring. This is an example query we have been using for some time.

There are some nice early results, so it’ll be interesting to watch as this develops.

Related Posts

Creating Map Plots With ggmap

Laura Ellis shows how to use the ggmap package to create choropleth maps in R: In the last map, it was a bit tricky to see the density of the incidents because all the graphed points were sitting on top of each other.  In this scenario, we are going to make the data all one […]

Read More

R 3.5.0 Released

Tal Galili announces that R 3.5.0 is now available: By default the (arbitrary) signs of the loadings from princomp() are chosen so the first element is non-negative. If –default-packages is not used, then Rscript now checks the environment variable R_SCRIPT_DEFAULT_PACKAGES. If this is set, then it takes precedence over R_DEFAULT_PACKAGES. If default packages are not specified on the command line or by one […]

Read More

Categories

January 2018
MTWTFSS
« Dec Feb »
1234567
891011121314
15161718192021
22232425262728
293031