Regularization Prevents Overfitting

Hui Li has an explanation of what regularization is and how it works to reduce the likelihood of overfitting training data:

Assume that the red line is the regression model we learn from the training data set. It can be seen that the learned model fits the training data set perfectly, while it cannot generalize well to the data not included in the training set. There are several ways to avoid the problem of overfitting.

To remedy this problem, we could:

  • Get more training examples.
  • Use a simple predictor.
  • Select a subsample of features.

In this blog post, we focus on the second and third ways to avoid overfitting by introducing regularization on the parameters βi of the model.

Read the whole thing.

Related Posts

Biases in Tree-Based Models

Nina Zumel looks at tree-based ensembling models like random forest and gradient boost and shows that they can be biased: In our previous article , we showed that generalized linear models are unbiased, or calibrated: they preserve the conditional expectations and rollups of the training data. A calibrated model is important in many applications, particularly when financial data […]

Read More

Comparing Poisson Regression to Regressing Against Logs

Nina Zumel compares a pair of methods for performing regression when income is the dependent variable: Regressing against the log of the outcome will not be calibrated; however it has the advantage that the resulting model will have lower relative error than a Poisson regression against income. Minimizing relative error is appropriate in situations when […]

Read More


July 2017
« Jun Aug »