Regularization Prevents Overfitting

Hui Li has an explanation of what regularization is and how it works to reduce the likelihood of overfitting training data:

Assume that the red line is the regression model we learn from the training data set. It can be seen that the learned model fits the training data set perfectly, while it cannot generalize well to the data not included in the training set. There are several ways to avoid the problem of overfitting.

To remedy this problem, we could:

  • Get more training examples.
  • Use a simple predictor.
  • Select a subsample of features.

In this blog post, we focus on the second and third ways to avoid overfitting by introducing regularization on the parameters βi of the model.

Read the whole thing.

Related Posts

Conjoint Analysis In R

Abhijit Telang introduces the concept of conjoint analysis and shows how you can implement this in R: We will need to typically transform the problem of utility modeling from its intangible, abstract form to something that is measurable. That is, we wish to assign a numeric value to the perceived utility by the consumer, and […]

Read More

Bayesian Modeling Of Hardware Failure Rates

Sean Owen shows how you can use Bayesian statistical approaches with Spark Streaming, using the example of hard drive failure rates: This data doesn’t arrive all at once, in reality. It arrives in a stream, and so it’s natural to run these kind of queries continuously. This is simple with Apache Spark’s Structured Streaming, and proceeds […]

Read More


July 2017
« Jun Aug »