Overfitting With Polynomial Regression

Vincent Granville shows us a few problems with polynomial regression:

Even if the function to be estimated is very smooth, due to machine precision, only the first three or four coefficients can be accurately computed. With infinite precision, all coefficients would be correctly computed without over-fitting. We first explore this problem from a mathematical point of view in the next section, then provide recommendations for practical model implementations in the last section.

This is also a good read for professionals with a math background interested in learning more about data science, as we start with some simple math, then discuss how it relates to data science. Also, this is an original article, not something you will learn in college classes or data camps, and it even features the solution to a linear regression involving an infinite number of variables.

Granville’s point that overfitting is a relatively small concern is rather interesting.  But the advice to avoid polynomial regression is generally pretty solid.

Related Posts

Bayesian Modeling Of Hardware Failure Rates

Sean Owen shows how you can use Bayesian statistical approaches with Spark Streaming, using the example of hard drive failure rates: This data doesn’t arrive all at once, in reality. It arrives in a stream, and so it’s natural to run these kind of queries continuously. This is simple with Apache Spark’s Structured Streaming, and proceeds […]

Read More

Handling Definitional Changes In Predictive Variables

Vincent Granville explains how you can blend two different definitions of a variable of interest together: The reasons why scores can become meaningless over time is because data evolves. New features (variables) are added that were not available before, the definition of a metric is suddenly changed (for instance, the way income is measured) resulting […]

Read More


May 2018
« Apr Jun »