So far so good. Let’s now remove the “intercept term” by adding the “
0+
” from the fitting command.m2 <- lm(y~0+x, data=d) t(broom::glance(m2))
## [,1] ## r.squared 7.524811e-01 ## adj.r.squared 7.474297e-01 ## sigma 3.028515e-01 ## statistic 1.489647e+02 ## p.value 1.935559e-30 ## df 2.000000e+00 ## logLik -2.143244e+01 ## AIC 4.886488e+01 ## BIC 5.668039e+01 ## deviance 8.988464e+00 ## df.residual 9.800000e+01
d$pred2 <- predict(m2, newdata = d)
Uh oh. That appeared to vastly improve the reported
R-squared
and the significance (“p.value
“)!
Read on to learn why this happens and how you can prevent this from tricking you in the future.