So far so good. Let’s now remove the “intercept term” by adding the “
0+” from the fitting command.m2 <- lm(y~0+x, data=d) t(broom::glance(m2))## [,1] ## r.squared 7.524811e-01 ## adj.r.squared 7.474297e-01 ## sigma 3.028515e-01 ## statistic 1.489647e+02 ## p.value 1.935559e-30 ## df 2.000000e+00 ## logLik -2.143244e+01 ## AIC 4.886488e+01 ## BIC 5.668039e+01 ## deviance 8.988464e+00 ## df.residual 9.800000e+01d$pred2 <- predict(m2, newdata = d)Uh oh. That appeared to vastly improve the reported
R-squaredand the significance (“p.value“)!
Read on to learn why this happens and how you can prevent this from tricking you in the future.