Dario Radecic takes us through an interesting library:
In a world where compute time is billed by the second, make every one of them count. There are zero valid reasons to utilize a quarter of your CPU and memory, but achieving complete resource utilization isn’t always a straightforward task. That is if you don’t know about R dtplyr.
One option is to use
dplyr
. It’s simple to use and has intuitive syntax. But it’s slow. The other option is to usedata.table
. It’s lightning-fast but has a steep learning curve and syntax that’s not too friendly to follow. The third – and your best option – is to combine the simplicity ofdplyr
with efficiency ofdata.table
. And that’s where Rdtplyr
chimes in!Today you’ll learn just how easy it is to switch from
dplyr
todtplyr
, and you’ll see hands-on the performance differences between the two. Let’s dig in!
I love the performance of the data.table
library but strongly prefer the Tidyverse for the sake of convenience. I like that this bridges the gap, at least for dplyr
style processing. H/T R-Bloggers.