Better Grouping With dplyr

John Mount builds a function to improve upon the group-by to mutate model in dplyr:

The advantages of the shorthand are:

  • The analyst only has to specify the grouping column once.
  • The data (mtcars) enters the pipeline only once.
  • The analyst doesn’t have to start thinking about joins immediately.

Frankly I’ve never liked the shorthand. I feel it is a “magic extra” that a new user would have no way of anticipating from common use of group_by() and summarize(). I very much like the idea of wrapping this important common use case into a single verb. Adjoining “windowed” or group-calculated columns is a common and important step in analysis, and well worth having its own verb.

Below is our attempt at elevating this pattern into a packaged verb.

Click through for the script.  I’d like to see something like this make its way into dplyr.

Related Posts

Random Forests In R

Anish Sing Walia explains the basics of random forests and provides sample code in R: Random Forests are similar to a famous Ensemble technique called Bagging but have a different tweak in it. In Random Forests the idea is to decorrelate the several trees which are generated on the different bootstrapped samples from training Data.And […]

Read More

Using seplyr Instead Of dplyr

John Mount explains seplyr and why it can be better for certain use cases than dplyr: seplyr is a dplyr adapter layer that prefers “slightly clunkier” standard interfaces (or referentially transparent interfaces), which are actually very powerful and can be used to some advantage. The above description and comparisons can come off as needlessly broad and painfully abstract. Things are […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

Categories

July 2017
MTWTFSS
« Jun  
 12
3456789
10111213141516
17181920212223
24252627282930
31