John Mount shows us that data.table is way faster for sorting than dplyr‘s arrange function:
Notice on the above semi-log plot the run time ratio is growing roughly linearly. This makes sense:
data.tableuses a radix sort which has the potential to perform in near linear time (faster than then log(n)lower bound known comparison sorting) for a range of problems (also we are only showing example sorting times, not worst-case sorting times).In fact, if we divide the
yin the above graph bylog(rows)we get something approaching a constant.
John has also provided us with a markdown document for comparison.