dplyr Mutate Quirks

Kevin Feasel

2017-09-25

R

John Mount explains a quirk in dplyr’s mutate function:

It is hard for experts to understand how frustrating the above is to a new R user or to a part time R user. It feels like any variation on the original code causes it to fail. None of the rules they have been taught anticipate this, or tell them how to get out of this situation.

This quickly leads to strong feelings of learned helplessness and anxiety.

Our rule for dplyr::mutate() has been for some time:

Each column name used in a single mutate must appear only on the left-hand-side of a single assignment, or otherwise on the right-hand-side of any number of assignments (but never both sides, even if it is different assignments).

If you do data analysis with R, you’ve probably run into this before.  I certainly have, and it’s nice to understand why this is the case.

Related Posts

Faster User-Defined Functions In SparkR

Liang Zhang and Hossein Falaki note a major performance improvement for functions in SparkR using the latest version of the Databricks Runtime: SparkR offers four APIs that run a user-defined function in R to a SparkDataFrame dapply() dapplyCollect() gapply() gapplyCollect() dapply() allows you to run an R function on each partition of the SparkDataFrame and returns […]

Read More

Subsetting Matrices In R

Dave Mason continues his look at matrices in R: We can extract an entire row from a matrix. To do this, specify the desired row only within the square brackets [ ]. The placeholder where you would otherwise specify the column is left empty. > #Points scored by Kendrick Perkins. > points_scored_by_quarter[1,] 1st 2nd 3rd 4th […]

Read More

Categories

September 2017
MTWTFSS
« Aug Oct »
 123
45678910
11121314151617
18192021222324
252627282930