Ramandeep Kaur explains overfitting as well as how to prevent overfitting on decision trees:
Causes of Overfitting
There are two major situations that could cause overfitting in DTrees:
- Overfitting Due to Presence of Noise – Mislabeled instances may contradict the class labels of other similar records.
- Overfitting Due to Lack of Representative Instances – Lack of representative instances in the training data can prevent refinement of the learning algorithm.
A good model must not only fit the training data well
but also accurately classify records it has never seen.How to avoid overfitting?
There are 2 major approaches to avoid overfitting in DTrees.
-
approaches that stop growing the tree earlier, before it reaches the point where it perfectly classifies the training data.
-
approaches that allow the tree to overfit the data, and then post-prune the tree.
Click through for more details on these two approaches.