Couple of packages I will mention for data manipulations are plyr, dplyr and data.table and compare the execution time, simplicity and ease of writing with general T-SQL code and RevoScaleR package. For this blog post I will use R packagedplyr and T-SQL with possibilites of RevoScaleR computation functions.
My initial query will be. Available in WideWorldImportersDW database. No other alterations have been done to underlying tables (fact.sale or dimension.city).
Read on for code and conclusions. I don’t think there are any shocking conclusions: the upshot is to filter data as early as possible.