Causes of Overfitting
There are two major situations that could cause overfitting in DTrees:
- Overfitting Due to Presence of Noise – Mislabeled instances may contradict the class labels of other similar records.
- Overfitting Due to Lack of Representative Instances – Lack of representative instances in the training data can prevent refinement of the learning algorithm.
A good model must not only fit the training data well
but also accurately classify records it has never seen.
How to avoid overfitting?
There are 2 major approaches to avoid overfitting in DTrees.
approaches that stop growing the tree earlier, before it reaches the point where it perfectly classifies the training data.
approaches that allow the tree to overfit the data, and then post-prune the tree.
Click through for more details on these two approaches.