Press "Enter" to skip to content

Smoothing and its Inherent Risks

John Mount would like you to take care when using smoothers:

Here is a quick data-scientist / data-analyst question: what is the overall trend or shape in the following noisy data? For our specific example: How do we relate value as a noisy function (or relation) of m? This example arose in producing our tutorial “The Nature of Overfitting”.

One would think this would be safe and easy to asses in R using ggplot2::geom_smooth(), but now we are not so sure.

Here’s a quick summary of my general philosophy: the data are more interesting than a smoothed line. I’m okay putting in a smoothed line to help a reader make sense of a trend, but I wouldn’t want to have a plot with just the smoothed line. Read the whole thing from John to get well beyond my rule of thumb.