Values Belong In Columns

John Mount argues that to reduce ambiguity, ensure that your values are columns on appropriate data frames:

Here is an (artificial) example.

chamber_sizes <- mtcars$disp/mtcars$cyl
form <- hp ~ chamber_sizes
model <- lm(form, data = mtcars)
# Call:
# lm(formula = form, data = mtcars)
# Coefficients:
# (Intercept) chamber_sizes
# 2.937 4.104 

Notice: one of the variables came from a vector in the environment, not from the primary data.framechamber_sizes was first looked for in the data.frame, and then in the environment the formula was defined (which happens to be the global environment), and (if that hadn’t worked) in the executing environment (which is again the global environment).

Our advice is: do not do that. Place all of your values in columns. Make it unambiguous all variables are names of columns in your data.frame of interest. This allows you to write simple code that works over explicit data. The style we recommend looks like the following.

Read the whole thing.

Related Posts

Calculating Lifetime Value With R

Sergey Bryl shows how to calculate the lifetime value of a subscription service: Predicting LTV is a common issue for a new, recently launched product/service/application when we don’t have a lot of historical data but want to calculate LTV as soon as possible. Even though we may have a lot of historical data on customer […]

Read More

Interpreting The Area Under The Receiver Operating Characteristic Curve

Roos Colman explains what a Receiver Operating Characteristic (ROC) curve is and how we interpret the Area Under the Curve (AUC): The AUC can be defined as “The probability that a randomly selected case will have a higher test result than a randomly selected control”. Let’s use this definition to calculate and visualize the estimated […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.


September 2018
« Aug