John Mount shares a few examples of partitioning and parallelizing data operations in R:
In this note we will show how to speed up work in
R
by partitioning data and process-level parallelization. We will show the technique with three differentR
packages:rqdatatable
,data.table
, anddplyr
. The methods shown will also work with base-R
and other packages.For each of the above packages we speed up work by using
wrapr::execute_parallel
which in turn useswrapr::partition_tables
to partition un-relateddata.frame
rows and then distributes them to different processors to be executed.rqdatatable::ex_data_table_parallel
conveniently bundles all of these steps together when working withrquery
pipelines.
There were some interesting results. I expected data.table to be fast, but did not expect dplyr to parallelize so well.
Comments closed