Forcing 0 Intercept Inflates R-squared In R

John Mount has an informative post on how you can trick yourself when running linear regression models in R and forcing the y intercept to be 0:

So far so good. Let’s now remove the “intercept term” by adding the “0+” from the fitting command.

m2 <- lm(y~0+x, data=d)t(broom::glance(m2))
## [,1]
## r.squared 7.524811e-01
## adj.r.squared 7.474297e-01
## sigma 3.028515e-01
## statistic 1.489647e+02
## p.value 1.935559e-30
## df 2.000000e+00
## logLik -2.143244e+01
## AIC 4.886488e+01
## BIC 5.668039e+01
## deviance 8.988464e+00
## df.residual 9.800000e+01
d$pred2 <- predict(m2, newdata = d)

Uh oh. That appeared to vastly improve the reported R-squared and the significance (“p.value“)!

Read on to learn why this happens and how you can prevent this from tricking you in the future.

Related Posts

Reviewing Word Associations With R

Julia Silge does some exploratory analysis on the Small World of Words project: The Small World of Words project focuses on word associations. You can try it out for yourself to see how it works, but the general idea is that the participant is presented with a word (from “telephone” to “journalist” to “yoga”) and is then […]

Read More

Using ggplot And plotly To Visualize Multivariate Data

Sebastian Sauer shows us a few techniques for visualizing multivariate data, using ggplot2 in some cases and plotly in others: Plotting univariate (sampled) normal dataWell, that’s obvious.d %>% ggplot(aes(x = X1)) + geom_density() It gets much less obvious from there.  It was also interesting learning about ggplotly, a function which translates ggplot2 visuals to plotly visuals.

Read More

Categories

June 2017
MTWTFSS
« May Jul »
 1234
567891011
12131415161718
19202122232425
2627282930