John Mount shows us that data.table
is way faster for sorting than dplyr
‘s arrange
function:
Notice on the above semi-log plot the run time ratio is growing roughly linearly. This makes sense:
data.table
uses a radix sort which has the potential to perform in near linear time (faster than then log(n)
lower bound known comparison sorting) for a range of problems (also we are only showing example sorting times, not worst-case sorting times).In fact, if we divide the
y
in the above graph bylog(rows)
we get something approaching a constant.
John has also provided us with a markdown document for comparison.