Database Reference
In-Depth Information
Chapter 6
Pruning Trees
6.1 Stopping Criteria
The growing phase continues until a stopping criterion is triggered. The
following conditions are common stopping rules:
(1) All instances in the training set belong to a single value of y .
(2) The maximum tree depth has been reached.
(3) The number of cases in the terminal node is less than the minimum
number of cases for parent nodes.
(4) If the node were split, the number of cases in one or more child nodes
would be less than the minimum number of cases for child nodes.
(5) The best splitting criteria is not greater than a certain threshold.
6.2 Heuristic Pruning
6.2.1
Overview
Employing tight stopping criteria tends to create small and underfitted
decision trees. On the other hand, using loose stopping criteria tends to
generate large decision trees that are overfitted to the training set. To
solve this dilemma, Breiman et al . (1984) developed a pruning methodology
based on a loose stopping criterion and allowing the decision tree to overfit
the training set. Then the overfitted tree is cut back into a smaller tree
by removing sub-branches that are not contributing to the generalization
accuracy. It has been shown in various studies that pruning methods can
improve the generalization performance of a decision tree, especially in noisy
domains.
Another key motivation of pruning is “trading accuracy for simplicity”
as presented by Bratko and Bohanec (1994). When the goal is to produce a
69
Search WWH ::




Custom Search