There are many useful functions contained within the dplyr package. This post does not attempt to cover them all but does look at the major functions that are commonly used in data manipulation tasks. These are:select() filter() mutate() group_by() summarise() arrange() join()
The data used in this post are taken from the UCI Machine Learning Repository and contain census information from 1994 for the USA. The dataset can be used for classification of income class in a machine learning setting and can be obtained here.
That’s probably the bare minimum you should know about dplyr, but knowing just these seven can make data analysis in R much easier.