This test shows that we can reject the null that the variance of the residuals is constant, thus heteroskedacity is present. To get the correct standard errors, we can use the

`vcovHC()`

function from the`{sandwich}`

package (hence the choice for the header picture of this post):`lmfit %>% vcovHC() %>% diag() %>% sqrt()`

`## (Intercept) regionnortheast regionsouth regionwest ## 311.31088691 25.30778221 23.56106307 24.12258706 ## residents young_residents per_capita_income ## 0.09184368 0.68829667 0.02999882`

By default

`vcovHC()`

estimates a heteroskedasticity consistent (HC) variance covariance matrix for the parameters. There are several ways to estimate such a HC matrix, and by default`vcovHC()`

estimates the “HC3” one. You can refer to Zeileis (2004) for more details.We see that the standard errors are much larger than before! The intercept and

`regionwest`

variables are not statistically significant anymore.

The biggest problem with heteroskedasticity is that it can introduce bias in error terms. That’s not the end of the world, but if the level of heteroskedasticity is serious enough, we want to find ways to account for it. H/T R-Bloggers.

Kevin Feasel

2018-07-09

Data Science, R