Overfitting On Decision Trees

Ramandeep Kaur explains overfitting as well as how to prevent overfitting on decision trees:

Causes of Overfitting

There are two major situations that could cause overfitting in DTrees:

  1. Overfitting Due to Presence of Noise – Mislabeled instances may contradict the class labels of other similar records.
  2. Overfitting Due to Lack of Representative Instances – Lack of representative instances in the training data can prevent refinement of the learning algorithm.

                      A good model must not only fit the training data well
                      but also accurately classify records it has never seen.

How to avoid overfitting?

There are 2 major approaches to avoid overfitting in DTrees.

  1. approaches that stop growing the tree earlier, before it reaches the point where it perfectly classifies the training data.

  2. approaches that allow the tree to overfit the data, and then post-prune the tree.

Click through for more details on these two approaches.

Related Posts

Anomaly Detection With Python

Robert Sheldon continues his SQL Server Machine Learning Series: As important as these concepts are to working Python and MLS, the purpose in covering them was meant only to provide you with a foundation for doing what’s really important in MLS, that is, using Python (or the R language) to analyze data and present the […]

Read More

Non-English Natural Language Processing

The folks at BNOSAC have announced a new natural language processing toolkit for R: BNOSAC is happy to announce the release of the udpipe R package (https://bnosac.github.io/udpipe/en) which is a Natural Language Processing toolkit that provides language-agnostic ‘tokenization’, ‘parts of speech tagging’, ‘lemmatization’, ‘morphological feature tagging’ and ‘dependency parsing’ of raw text. Next to text […]

Read More

Categories

September 2017
MTWTFSS
« Aug Oct »
 123
45678910
11121314151617
18192021222324
252627282930