Multidplyr

Kevin Feasel

2016-12-19

R

Matt Dancho shows how to use multidplyr to perform parallel processing on data cleansing activities:

There’s nothing more frustrating than waiting for long-running R scripts to iteratively run. I’ve recently come across a new-ish package for parallel processing that plays nicely with the tidyverse: multidplyr. The package has saved me countless hours when applied to long-running, iterative scripts. In this post, I’ll discuss the workflow to parallelize your code, and I’ll go through a real world example of collecting stock prices where it improves speed by over 5X for a process that normally takes 2 minutes or so. Once you grasp the workflow, the parallelization can be applied to almost any iterative scripts regardless of application.

This is a longer article, but if you’re using dplyr with R today, it’s worth a read.

Related Posts

Python versus R (Again)

Alex Woodie looks at whether Python is dominating R in the data science space: There is some evidence that Python’s popularity is hurting R usage. According to the TIOBE Index, Python is currently the third most popular language in the world, behind perennial heavyweights Java and C. From August 2018 to August 2019, Python usage surged […]

Read More

Local Randomness and R

Evgeni Chasnovski has a problem around generating random data: Let’s say we have a deterministic (non-random) problem for which one of the solutions involves randomness. One very common example of such problem is a function minimization on certain interval: it can be solved non-randomly (like in most methods of optim()), or randomly (the simplest approach being […]

Read More

Categories

December 2016
MTWTFSS
« Nov Jan »
 1234
567891011
12131415161718
19202122232425
262728293031