Understanding Boosted Trees

Maria Jesus Alonso explains decision trees and their subsequent improvements:

Bagging (or Bootsrap Aggregating), the second prediction technique brought to the BigML Dashboard and API, uses a collection of trees (rather than a single one), each tree built with a different random subset of the original dataset for each model in the ensemble. Specifically, BigML defaults to a sampling rate of 100% (with replacement) for each model. This means some of the original instances will be repeated and others will be left out. Bagging performs well when a dataset has many noisy features and only one or two are relevant. In those cases, Bagging will be the best option.

Random Decision Forests extend the Bagging technique by only considering a random subset of the input fields at each split of the tree. By adding randomness in this process, Random Decision Forests help avoid overfitting. When there are many useful fields in your dataset, Random Decision Forests are a strong choice.

Click through for how boosted trees change this model a bit.

Related Posts

Explaining Confidence Intervals

Mala Mahadevan explains what confidence intervals are: Suppose I look at a sampling of 100 americans who are asked if they approve of the job the supreme court is doing. Let us say for simplicity’s sake that the only two answers possible are yes or no. Out of 100, say 40% say yes. As an […]

Read More

Introduction To Bayesian Statistics

Kennie Nybo Pontoppidan has just completed a course on Bayesian statistics: Last month I finished a four-week course on Bayesian statistics. I have always wondered why people deemed it hard, and why I heard that the computations quickly became complicated. The course wasn’t that hard, and it gave a nice introduction to prior/posterior distributions and […]

Read More


April 2017
« Mar May »