Overfitting On Decision Trees

Ramandeep Kaur explains overfitting as well as how to prevent overfitting on decision trees:

Causes of Overfitting

There are two major situations that could cause overfitting in DTrees:

  1. Overfitting Due to Presence of Noise – Mislabeled instances may contradict the class labels of other similar records.
  2. Overfitting Due to Lack of Representative Instances – Lack of representative instances in the training data can prevent refinement of the learning algorithm.

                      A good model must not only fit the training data well
                      but also accurately classify records it has never seen.

How to avoid overfitting?

There are 2 major approaches to avoid overfitting in DTrees.

  1. approaches that stop growing the tree earlier, before it reaches the point where it perfectly classifies the training data.

  2. approaches that allow the tree to overfit the data, and then post-prune the tree.

Click through for more details on these two approaches.

Related Posts

Building an Image Classifier with PyTorch

Rogier van der Geer shows how you can use PyTorch to build out a Convolutional Neural Network for image classification: The tool that we are going to use to make a classifier is called a convolutional neural network, or CNN. You can find a great explanation of what these are right here on wikipedia. But we […]

Read More

xgboost and Small Numbers of Subtrees

John Mount covers an interesting issue you can run into when using xgboost: While reading Dr. Nina Zumel’s excellent note on bias in common ensemble methods, I ran the examples to see the effects she described (and I think it is very important that she is establishing the issue, prior to discussing mitigation).In doing that I ran into one more […]

Read More


September 2017
« Aug Oct »