Database Reference
In-Depth Information
internal node to determine whether replacing it with the most frequent
class does not reduce the trees accuracy. The node is pruned if accuracy
is not reduced. The procedure continues until any further pruning would
decrease the accuracy.
In order to estimate the accuracy, Quinlan (1987) proposes the usage
of a pruning set. It can be shown that this procedure ends with the smallest
accurate sub-tree with respect to a given pruning set.
6.2.4
Minimum Error Pruning
(
MEP
)
MEP, proposed by Niblett and Bratko (1986), involves bottom-up traver-
sal of the internal nodes. This technique compares, in each node, the
l -probability error rate estimation with and without pruning.
The l -probability error rate estimation is a correction to the simple
probability estimation using frequencies. If S t denotes the instances that
have reached a leaf t , then the expected error rate in this leaf is:
|
σ y = c i S t |
+ l
·
p apr ( y = c i )
ε ( t )=1
max
c i ∈dom ( y )
,
(6.2)
|
S t |
+ l
where p apr ( y = c i )isthe apriori probability of y getting the value c i ,and
l denotes the weight given to the apriori probability.
The error rate of an internal node is the weighted average of the error
rate of its branches. The weight is determined according to the proportion
of instances along each branch. The calculation is performed recursively up
to the leaves.
If an internal node is pruned, then it becomes a leaf and its error rate is
calculated directly using the last equation. Consequently, we can compare
the error rate before and after pruning a certain internal node. If pruning
this node does not increase the error rate, the pruning should be accepted.
6.2.5
Pessimistic Pruning
Pessimistic pruning avoids the need of a pruning set or cross-validation and
uses the pessimistic statistical correlation test instead [ Quinlan (1993) ] .
The basic idea is that the error ratio that was estimated using the
training set is not reliable enough. Instead, a more realistic measure, known
as the continuity correction for binomial distribution, should be used:
ε ( T,S )= ε ( T,S )+ |
leaves ( T )
|
.
(6.3)
2
·|
S
|
Search WWH ::




Custom Search