Press "Enter" to skip to content

Dealing With Heteroskedasticity

Bruno Rodrigues explains the notion of heteroskedasticity and shows ways of dealing with this issue in a linear regression:

This test shows that we can reject the null that the variance of the residuals is constant, thus heteroskedacity is present. To get the correct standard errors, we can use the vcovHC() function from the {sandwich} package (hence the choice for the header picture of this post):

lmfit %>% 
    vcovHC() %>% 
    diag() %>% 
    sqrt()
##       (Intercept)   regionnortheast       regionsouth        regionwest 
##      311.31088691       25.30778221       23.56106307       24.12258706 
##         residents   young_residents per_capita_income 
##        0.09184368        0.68829667        0.02999882

By default vcovHC() estimates a heteroskedasticity consistent (HC) variance covariance matrix for the parameters. There are several ways to estimate such a HC matrix, and by default vcovHC() estimates the “HC3” one. You can refer to Zeileis (2004) for more details.

We see that the standard errors are much larger than before! The intercept and regionwest variables are not statistically significant anymore.

The biggest problem with heteroskedasticity is that it can introduce bias in error terms.  That’s not the end of the world, but if the level of heteroskedasticity is serious enough, we want to find ways to account for it.  H/T R-Bloggers.