Database Reference
In-Depth Information
However, this correction still produces an optimistic error rate. Conse-
quently, Quinlan (1993) suggests pruning an internal node t if its error rate
is within one standard error from a reference tree, namely:
ε ( T,S )
·
(1
ε ( T,S ))
ε ( pruned ( T,t ) ,S )
ε ( T,S )+
.
(6.4)
|
S
|
The last condition is based on the statistical confidence interval for
proportions. Usually, the last condition is used such that T refers to a sub-
tree whose root is the internal node t and S denotes the portion of the
training set that refers to the node t .
The pessimistic pruning procedure performs top-down traversal over
the internal nodes. If an internal node is pruned, then all its descendants
are removed from the pruning process, resulting in a relatively fast pruning.
6.2.6
)
EBP is an evolution of the pessimistic pruning. It is implemented in the
well-known C4.5 algorithm.
As in pessimistic pruning, the error rate is estimated using the upper
bound of the statistical confidence interval for proportions.
Error-Based Pruning
(
EBP
ε ( T,S )
·
(1
ε ( T,S ))
ε UB ( T,S )= ε ( T,S )+ Z α ·
,
(6.5)
|
S
|
where ε ( T,S ) denotes the misclassification rate of the tree T on the training
set S ; Z is the inverse of the standard normal cumulative distribution; and
α is the desired significance level.
Let subtree ( T,t ) denote the subtree rooted by the node t .Let
maxchild ( T,t ) denote the most frequent child node of t (namely most of
the instances in S reach this particular child) and let S t denote all instances
in S that reach the node t . The procedure traverses bottom-up all nodes
and compares the following values:
(1) ε UB ( subtree ( T,t ) ,S t )
(2) ε UB ( pruned ( subtree ( T,t ) ,t ) ,S t )
(3) ε UB ( subtree ( T,maxchild ( T,t )) ,S maxchild ( T,t ) ).
According to the lowest value, the procedure either leaves the tree as
is; prune the node t ; or replaces the node t with the subtree rooted by
maxchild ( T,t ).
Search WWH ::




Custom Search