This test shows that we can reject the null that the variance of the residuals is constant, thus heteroskedacity is present. To get the correct standard errors, we can use the
vcovHC()
function from the{sandwich}
package (hence the choice for the header picture of this post):lmfit %>% vcovHC() %>% diag() %>% sqrt()
## (Intercept) regionnortheast regionsouth regionwest ## 311.31088691 25.30778221 23.56106307 24.12258706 ## residents young_residents per_capita_income ## 0.09184368 0.68829667 0.02999882
By default
vcovHC()
estimates a heteroskedasticity consistent (HC) variance covariance matrix for the parameters. There are several ways to estimate such a HC matrix, and by defaultvcovHC()
estimates the “HC3” one. You can refer to Zeileis (2004) for more details.We see that the standard errors are much larger than before! The intercept and
regionwest
variables are not statistically significant anymore.
The biggest problem with heteroskedasticity is that it can introduce bias in error terms. That’s not the end of the world, but if the level of heteroskedasticity is serious enough, we want to find ways to account for it. H/T R-Bloggers.