Dplyr Tutorial

Deepanshu Bhalla has a nice dplyr tutorial:

What is dplyr?
dplyr is a powerful R-package to manipulate, clean and summarize unstructured data. In short, it makes data exploration and data manipulation easy and fast in R.
What’s special about dplyr?

The package “dplyr” comprises many functions that perform mostly used data manipulation operations such as applying filter, selecting specific columns, sorting data, adding or deleting columns and aggregating data. Another most important advantage of this package is that it’s very easy to learn and use dplyr functions. Also easy to recall these functions. For example, filter() is used to filter rows.

dplyr is a core package when it comes to data cleansing in R, and the more of it you can internalize, the faster you’ll move in that language.

Related Posts

The Basics Of PCA In R

Prashant Shekhar gives us an overview of Principal Component Analysis using R: PCA changes the axis towards the direction of maximum variance and then takes projection on this new axis. The direction of maximum variance is represented by Principal Components (PC1). There are multiple principal components depending on the number of dimensions (features) in the […]

Read More

Tidy Data Is Normalized Data

I emphasize the link between a tidy dataframe and a normalized data structure: The kicker, as Wickham describes on pages 4-5, is that normalization is a critical part of tidying data.  Specifically, Wickham argues that tidy data should achieve third normal form. Now, in practice, Wickham argues, we tend to need to denormalize data because […]

Read More


February 2017
« Jan Mar »