Biases in Tree-Based Models

Nina Zumel looks at tree-based ensembling models like random forest and gradient boost and shows that they can be biased:

In our previous article , we showed that generalized linear models are unbiased, or calibrated: they preserve the conditional expectations and rollups of the training data. A calibrated model is important in many applications, particularly when financial data is involved.

However, when making predictions on individuals, a biased model may be preferable; biased models may be more accurate, or make predictions with lower relative error than an unbiased model. For example, tree-based ensemble models tend to be highly accurate, and are often the modeling approach of choice for many machine learning applications. In this note, we will show that tree-based models are biased, or uncalibrated. This means they may not always represent the best bias/variance trade-off.

Read on for an example.

Related Posts


Nina Zumel announces a new version of WVPlots on CRAN: WVPlots was originally a catch-all package of ggplot2 visualizations that we at Win-Vector tended to use repeatedly, and wanted to turn into “one-liners.” A consequence of this is that the older visualizations had our preferred color schemes hard-coded in. More recent additions to the package sometimes had palette […]

Read More

Icon Maps in R

Laura Ellis shows how you can build maps full of little icons: That was ok, but we should try to make the images more aesthetically pleasing using the magick package. We make each image transparent with the image_transparent() function. We can also make the resulting image a specific color with image_colorize(). I then saved the […]

Read More


July 2019
« Jun Aug »