Forcing 0 Intercept Inflates R-squared In R

John Mount has an informative post on how you can trick yourself when running linear regression models in R and forcing the y intercept to be 0:

So far so good. Let’s now remove the “intercept term” by adding the “0+” from the fitting command.

m2 <- lm(y~0+x, data=d)t(broom::glance(m2))
## [,1]
## r.squared 7.524811e-01
## adj.r.squared 7.474297e-01
## sigma 3.028515e-01
## statistic 1.489647e+02
## p.value 1.935559e-30
## df 2.000000e+00
## logLik -2.143244e+01
## AIC 4.886488e+01
## BIC 5.668039e+01
## deviance 8.988464e+00
## df.residual 9.800000e+01
d$pred2 <- predict(m2, newdata = d)

Uh oh. That appeared to vastly improve the reported R-squared and the significance (“p.value“)!

Read on to learn why this happens and how you can prevent this from tricking you in the future.

Related Posts

Probabilities And Poker

Steve Miller has a notebook on 5-card draw probabilities: The population of 5 card draw hands, consisting of 52 choose 5 or 2598960 elements, is pretty straightforward both mathematically and statistically. So of course ever the geek, I just had to attempt to show her how probability and statistics converge. In addition to explaining the […]

Read More

There Is No Easy Button With Predictive Analytics

Scott Mutchler dispels some myths: There are a couple of myths that I see more an more these days.  Like many myths they seem plausible on the surface but experienced data scientist know that the reality is more nuanced (and sadly requires more work). Myths: Deep learning (or Cognitive Analytics) is an easy button.  You […]

Read More


June 2017
« May Jul »