Dealing With Multicollinearity With R

2018-08-21

Perfect multicollinearity occurs when one independent variable is an exact linear combination of other variables. For example, you already have X and Y as independent variables and you add another variable, Z = a*X + b*Y, to the set of independent variables. Now, this new variable, Z, does not add any significant or different value than provided by X or Y. The model can adjust itself to set the parameters that this combination is taken care of while determining the coefficients.

Multicollinearity may arise from several factors. Inclusion or incorrect use of dummy variables in the system may lead to multicollinearity. The other reason could be the usage of derived variables, i.e., one variable is computed from other variables in the system. This is similar to the example we took at the beginning of the article. The other reason could be taking variables which are similar in nature or which provide similar information or the variables which have very high correlation among each other.

Multicollinearity can make regression analysis trickier, and it’s worth knowing about.  H/T R-bloggers.

Sales Predictions with Pandas

2019-06-19

Megan Quinn shows how you can use Pandas and linear regression to predict sales figures: Pandas is an open-source Python package that provides users with high-performing and flexible data structures. These structures are designed to make analyzing relational or labeled data both easy and intuitive. Pandas is one of the most popular and quintessential tools leveraged […]

Linear Regression Assumptions

2019-06-17

Stephanie Glen has a chart which explains the four key assumptions behind when Ordinary Least Squares is the Best Linear Unbiased Estimator: If any of the main assumptions of linear regression are violated, any results or forecasts that you glean from your data will be extremely biased, inefficient or misleading. Navigating all of the different assumptions […]