Forcing 0 Intercept Inflates R-squared In R

John Mount has an informative post on how you can trick yourself when running linear regression models in R and forcing the y intercept to be 0:

So far so good. Let’s now remove the “intercept term” by adding the “0+” from the fitting command.

m2 <- lm(y~0+x, data=d)t(broom::glance(m2))
## [,1]
## r.squared 7.524811e-01
## adj.r.squared 7.474297e-01
## sigma 3.028515e-01
## statistic 1.489647e+02
## p.value 1.935559e-30
## df 2.000000e+00
## logLik -2.143244e+01
## AIC 4.886488e+01
## BIC 5.668039e+01
## deviance 8.988464e+00
## df.residual 9.800000e+01
d$pred2 <- predict(m2, newdata = d)

Uh oh. That appeared to vastly improve the reported R-squared and the significance (“p.value“)!

Read on to learn why this happens and how you can prevent this from tricking you in the future.

Related Posts

Calculating Lifetime Value With R

Sergey Bryl shows how to calculate the lifetime value of a subscription service: Predicting LTV is a common issue for a new, recently launched product/service/application when we don’t have a lot of historical data but want to calculate LTV as soon as possible. Even though we may have a lot of historical data on customer […]

Read More

Interpreting The Area Under The Receiver Operating Characteristic Curve

Roos Colman explains what a Receiver Operating Characteristic (ROC) curve is and how we interpret the Area Under the Curve (AUC): The AUC can be defined as “The probability that a randomly selected case will have a higher test result than a randomly selected control”. Let’s use this definition to calculate and visualize the estimated […]

Read More


June 2017
« May Jul »