Overfitting On Decision Trees

Ramandeep Kaur explains overfitting as well as how to prevent overfitting on decision trees:

Causes of Overfitting

There are two major situations that could cause overfitting in DTrees:

  1. Overfitting Due to Presence of Noise – Mislabeled instances may contradict the class labels of other similar records.
  2. Overfitting Due to Lack of Representative Instances – Lack of representative instances in the training data can prevent refinement of the learning algorithm.

                      A good model must not only fit the training data well
                      but also accurately classify records it has never seen.

How to avoid overfitting?

There are 2 major approaches to avoid overfitting in DTrees.

  1. approaches that stop growing the tree earlier, before it reaches the point where it perfectly classifies the training data.

  2. approaches that allow the tree to overfit the data, and then post-prune the tree.

Click through for more details on these two approaches.

Related Posts

Unintentional Data

Eric Hollingsworth describes data science as the cost of collecting data approaches zero: Thankfully not only have modern data analysis tools made data collection cheap and easy, they have made the process of exploratory data analysis cheaper and easier as well. Yet when we use these tools to explore data and look for anomalies or […]

Read More

Measuring Semantic Relatedness

Sandipan Dey re-works a university assignment on semantic relatedness in Python: Let’s define the semantic relatedness of two WordNet nouns x and y as follows: A = set of synsets in which x appears B = set of synsets in which y appears distance(x, y) = length of shortest ancestral path of subsets A and B sca(x, y) = a shortest common ancestor of subsets A and B This is the notion of […]

Read More

Categories

September 2017
MTWTFSS
« Aug Oct »
 123
45678910
11121314151617
18192021222324
252627282930