John Mount shares some performance measures pitting data.table
against various dplyr
methods for calculating grouped means:
In this reproduction attempt we see:
– Thedplyr
time being around 0.05 seconds. This is about 5 times slower than claimed.
– Thedplyr
sum()/n()
time is about 0.2 seconds, about 5 times faster than claimed.
– Thedata.table
time being around 0.004 seconds. This is about three times as fast as thedplyr
claims, and over ten times as fast as the actual observeddplyr
behavior.
Read the whole thing. If you want to replicate it yourself, check out the RMarkdown file.