Database Reference
In-Depth Information
suciently accurate, compact concept description, pruning is highly useful.
Since within this process the initial decision tree is seen as a completely
accurate one, the accuracy of a pruned decision tree indicates how close it
is to the initial tree.
There are various techniques for pruning decision trees. Most perform
top-down or bottom-up traversal of the nodes. A node is pruned if this
operation improves a certain criteria. The following subsections describe
the most popular techniques.
6.2.2
Cost Complexity Pruning
Cost complexity pruning (also known as weakest link pruning or error
complexity pruning) proceeds in two stages [ Breiman et al . (1984) ] .Inthe
first stage, a sequence of trees T 0 ,T 1 ,...,T k is built on the training data
where T 0 is the original tree before pruning and T k is the root tree.
In the second stage, one of these trees is chosen as the pruned tree,
based on its generalization error estimation.
The tree T i +1 is obtained by replacing one or more of the sub-trees in
the predecessor tree T i with suitable leaves. The sub-trees that are pruned
are those that obtain the lowest increase in apparent error rate per pruned
leaf:
ε ( pruned ( T,t ) ,S )
ε ( T,S )
α =
,
(6.1)
|
leaves ( T )
|−|
leaves ( pruned ( T,t ))
|
where ε ( T,S ) indicates the error rate of the tree T over the sample S and
|
denotes the number of leaves in T . pruned ( T,t ) denotes the
tree obtained by replacing the node t in T with a suitable leaf.
In the second phase, the generalization error of each pruned tree is
estimated. The best pruned tree is then selected. If the given dataset is large
enough, the authors suggest breaking it into a training set and a pruning
set. The trees are constructed using the training set and evaluated on the
pruning set. On the other hand, if the given dataset is not large enough,
they propose using cross-validation methodology, despite the computational
complexity implications.
leaves ( T )
|
6.2.3
Reduced Error Pruning
A simple procedure for pruning decision trees, known as Reduced Error
Pruning, has been suggested by Quinlan (1987). While traversing over the
internal nodes from the bottom to the top, the procedure checks each
Search WWH ::




Custom Search