Dario Radecic takes us through an interesting library:
In a world where compute time is billed by the second, make every one of them count. There are zero valid reasons to utilize a quarter of your CPU and memory, but achieving complete resource utilization isn’t always a straightforward task. That is if you don’t know about R dtplyr.
One option is to use dplyr
. It’s simple to use and has intuitive syntax. But it’s slow. The other option is to use data.table
. It’s lightning-fast but has a steep learning curve and syntax that’s not too friendly to follow. The third – and your best option – is to combine the simplicity of dplyr
with efficiency of data.table
. And that’s where R dtplyr
chimes in!
Today you’ll learn just how easy it is to switch from dplyr
to dtplyr
, and you’ll see hands-on the performance differences between the two. Let’s dig in!
I love the performance of the data.table
library but strongly prefer the Tidyverse for the sake of convenience. I like that this bridges the gap, at least for dplyr
style processing. H/T R-Bloggers.