Slicing In R

Kevin Feasel



John Mount recommends learning about the array slicing system in R:

R has a very powerful array slicing ability that allows for some very slick data processing.

Suppose we have a data.frame “d“, and for every row where d$n_observations < 5 we wish to “NA-out” some other columns (mark them as not yet reliably available). Using slicing techniques this can be done quite quickly as follows.

d[d$n_observations < 5, qc(mean_cost, mean_revenue, mean_duration)] <- NA

Read on for more.  In general, I prefer the pipeline mechanics offered with the Tidyverse for readability.  But this is a good example of why you should know both styles.

Related Posts

From Excel to R: Three Examples

Abdul Majed Raja has a few examples of things which are easy to do in Excel and how you can do them in R: Create a difference variable between the current value and the next valueThis is also known as lead and lag – especially in a time series dataset this varaible becomes very important in feature engineering. In […]

Read More

Calculating AUC in R

Andrew Treadway shows how you can calculate Area Under the Curve in R: AUC is an important metric in machine learning for classification. It is often used as a measure of a model’s performance. In effect, AUC is a measure between 0 and 1 of a model’s performance that rank-orders predictions from a model. For […]

Read More


April 2018
« Mar May »