Polishing Uncalibrated Models

Nina Zumel takes an uncalibrated random forest model and applies a calibration technique to improve the estimate on one variable:

In the previous article in this series, we showed that common ensemble models like random forest and gradient boosting are uncalibrated: they are not guaranteed to estimate aggregates or rollups of the data in an unbiased way. However, they can be preferable to calibrated models such as linear or generalized linear regression, when they make more accurate predictions on individuals. In this article, we’ll demonstrate one ad-hoc method for calibrating an uncalibrated model with respect to specific grouping variables. This “polishing step” potentially returns a model that estimates certain rollups in an unbiased way, while retaining good performance on individual predictions.

This is a great explanation of the process as well as its risks and limitations.

Related Posts

MAPE and Its Flaws

Jan Fischer takes us through Mean Absolute Percentage Error as a measure of forecast quality: Particular small actual values bias the MAPE.If any true values are very close to zero, the corresponding absolute percentage errors will be extremely high and therefore bias the informativity of the MAPE (Hyndman & Koehler 2006). The following graph clarifies this […]

Read More

From Excel to R: Three Examples

Abdul Majed Raja has a few examples of things which are easy to do in Excel and how you can do them in R: Create a difference variable between the current value and the next valueThis is also known as lead and lag – especially in a time series dataset this varaible becomes very important in feature engineering. In […]

Read More


July 2019
« Jun Aug »