John Mount would like you to take care when using smoothers:
Here is a quick data-scientist / data-analyst question: what is the overall trend or shape in the following noisy data? For our specific example: How do we relate
value
as a noisy function (or relation) ofm
? This example arose in producing our tutorial “The Nature of Overfitting”.One would think this would be safe and easy to asses in
R
usingggplot2::geom_smooth()
, but now we are not so sure.
Here’s a quick summary of my general philosophy: the data are more interesting than a smoothed line. I’m okay putting in a smoothed line to help a reader make sense of a trend, but I wouldn’t want to have a plot with just the smoothed line. Read the whole thing from John to get well beyond my rule of thumb.
Comments closed