Values Belong In Columns

Kevin Feasel



John Mount argues that to reduce ambiguity, ensure that your values are columns on appropriate data frames:

Here is an (artificial) example.

chamber_sizes <- mtcars$disp/mtcars$cyl
form <- hp ~ chamber_sizes
model <- lm(form, data = mtcars)
# Call:
# lm(formula = form, data = mtcars)
# Coefficients:
# (Intercept) chamber_sizes
# 2.937 4.104 

Notice: one of the variables came from a vector in the environment, not from the primary data.framechamber_sizes was first looked for in the data.frame, and then in the environment the formula was defined (which happens to be the global environment), and (if that hadn’t worked) in the executing environment (which is again the global environment).

Our advice is: do not do that. Place all of your values in columns. Make it unambiguous all variables are names of columns in your data.frame of interest. This allows you to write simple code that works over explicit data. The style we recommend looks like the following.

Read the whole thing.

Related Posts

Donating To The R Foundation

Mark Niemann-Ross explains how you can donate to the R Foundation: I benefit from the work of the R Foundation. They oversee the language, but also encourage a healthy ecosystem. CRAN happens because of them. Updates to R happen because of them. useR! happens because of them. Every day, you and I are the recipients […]

Read More

Timing Means Of Groups With R

John Mount shares some performance measures pitting data.table against various dplyr methods for calculating grouped means: In this reproduction attempt we see:– The dplyr time being around 0.05 seconds. This is about 5 times slower than claimed.– The dplyr sum()/n() time is about 0.2 seconds, about 5 times faster than claimed.– The data.table time being around 0.004 seconds. This is about three times as […]

Read More


September 2018
« Aug Oct »