John Mount shares a few examples of partitioning and parallelizing data operations in R:
In this note we will show how to speed up work in
Rby partitioning data and process-level parallelization. We will show the technique with three differentRpackages:rqdatatable,data.table, anddplyr. The methods shown will also work with base-Rand other packages.For each of the above packages we speed up work by using
wrapr::execute_parallelwhich in turn useswrapr::partition_tablesto partition un-relateddata.framerows and then distributes them to different processors to be executed.rqdatatable::ex_data_table_parallelconveniently bundles all of these steps together when working withrquerypipelines.
There were some interesting results. I expected data.table to be fast, but did not expect dplyr to parallelize so well.
Comments closed