Understanding Bootstrap Aggregating (Bagging)

Gabriel Vasconcelos explains the bagging technique:

The name bagging comes from boostrap aggregating. It is a machine learning technique proposed by Breiman (1996) to increase stability in potentially unstable estimators. For example, suppose you want to run a regression with a few variables in two steps. First, you run the regression with all the variables in your data and select the significant ones. Second, you run a new regression using only the selected variables and compute the predictions.

This procedure is not wrong if your problem is forecasting. However, this two step estimation may result in highly unstable models. If many variables are important but individually their importance is small, you will probably leave some of them out, and small perturbations on the data may drastically change the results.

Read on to see how bootstrap aggregation works and how it solves this solution instability problem.

Related Posts

Road Construction Incentive Contracts And R

Sebastian Kranz promotes an interesting RTutor project: Patrick Bajari and Gregory Lewis have collected a detailed sample of 466 road construction projects in Minnesota to study this question in their very interesting article Moral Hazard, Incentive Contracts and Risk: Evidence from Procurement in the Review of Economic Studies, 2014.They estimate a structural econometric model and find that […]

Read More

Analyzing Customer Churn With Keras And H2O

Shirin Glander has released code pertaining to a forthcoming book chapter: This is code that accompanies a book chapter on customer churn that I have written for the German dpunkt Verlag. The book is in German and will probably appear in February: https://www.dpunkt.de/buecher/13208/9783864906107-data-science.html.The code you find below can be used to recreate all figures and analyses from this […]

Read More