Dplyr Tutorial

Deepanshu Bhalla has a nice dplyr tutorial:

What is dplyr?
dplyr is a powerful R-package to manipulate, clean and summarize unstructured data. In short, it makes data exploration and data manipulation easy and fast in R.
What’s special about dplyr?

The package “dplyr” comprises many functions that perform mostly used data manipulation operations such as applying filter, selecting specific columns, sorting data, adding or deleting columns and aggregating data. Another most important advantage of this package is that it’s very easy to learn and use dplyr functions. Also easy to recall these functions. For example, filter() is used to filter rows.

dplyr is a core package when it comes to data cleansing in R, and the more of it you can internalize, the faster you’ll move in that language.

Related Posts

Reinforcement Learning with R

Holger von Jouanne-Diedrich takes us through concepts in reinforcement learning: At the core this can be stated as the problem a gambler has who wants to play a one-armed bandit: if there are several machines with different winning probabilities (a so-called multi-armed bandit problem) the question the gambler faces is: which machine to play? He could “exploit” one […]

Read More

Biases in Tree-Based Models

Nina Zumel looks at tree-based ensembling models like random forest and gradient boost and shows that they can be biased: In our previous article , we showed that generalized linear models are unbiased, or calibrated: they preserve the conditional expectations and rollups of the training data. A calibrated model is important in many applications, particularly when financial data […]

Read More


February 2017
« Jan Mar »