Regularization Prevents Overfitting

Hui Li has an explanation of what regularization is and how it works to reduce the likelihood of overfitting training data:

Assume that the red line is the regression model we learn from the training data set. It can be seen that the learned model fits the training data set perfectly, while it cannot generalize well to the data not included in the training set. There are several ways to avoid the problem of overfitting.

To remedy this problem, we could:

  • Get more training examples.
  • Use a simple predictor.
  • Select a subsample of features.

In this blog post, we focus on the second and third ways to avoid overfitting by introducing regularization on the parameters βi of the model.

Read the whole thing.

Related Posts

Unintentional Data

Eric Hollingsworth describes data science as the cost of collecting data approaches zero: Thankfully not only have modern data analysis tools made data collection cheap and easy, they have made the process of exploratory data analysis cheaper and easier as well. Yet when we use these tools to explore data and look for anomalies or […]

Read More

Measuring Semantic Relatedness

Sandipan Dey re-works a university assignment on semantic relatedness in Python: Let’s define the semantic relatedness of two WordNet nouns x and y as follows: A = set of synsets in which x appears B = set of synsets in which y appears distance(x, y) = length of shortest ancestral path of subsets A and B sca(x, y) = a shortest common ancestor of subsets A and B This is the notion of […]

Read More


July 2017
« Jun Aug »