Pruning Trees - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

internal node to determine whether replacing it with the most frequent

class does not reduce the trees accuracy. The node is pruned if accuracy

is not reduced. The procedure continues until any further pruning would

decrease the accuracy.

In order to estimate the accuracy, Quinlan (1987) proposes the usage

of a pruning set. It can be shown that this procedure ends with the smallest

accurate sub-tree with respect to a given pruning set.

6.2.4

Minimum Error Pruning

(

MEP

)

MEP, proposed by Niblett and Bratko (1986), involves bottom-up traver-

sal of the internal nodes. This technique compares, in each node, the

l -probability error rate estimation with and without pruning.

The l -probability error rate estimation is a correction to the simple

probability estimation using frequencies. If S t denotes the instances that

have reached a leaf t , then the expected error rate in this leaf is:

|

σ y = c i S t |

+ l

·

p apr ( y = c i )

ε ( t )=1

−

max

c i ∈dom ( y )

,

(6.2)

|

S t |

+ l

where p apr ( y = c i )isthe apriori probability of y getting the value c i ,and

l denotes the weight given to the apriori probability.

The error rate of an internal node is the weighted average of the error

rate of its branches. The weight is determined according to the proportion

of instances along each branch. The calculation is performed recursively up

to the leaves.

If an internal node is pruned, then it becomes a leaf and its error rate is

calculated directly using the last equation. Consequently, we can compare

the error rate before and after pruning a certain internal node. If pruning

this node does not increase the error rate, the pruning should be accepted.

6.2.5

Pessimistic Pruning

Pessimistic pruning avoids the need of a pruning set or cross-validation and

uses the pessimistic statistical correlation test instead [ Quinlan (1993) ] .

The basic idea is that the error ratio that was estimated using the

training set is not reliable enough. Instead, a more realistic measure, known

as the continuity correction for binomial distribution, should be used:

ε ( T,S )= ε ( T,S )+ |

leaves ( T )

|

.

(6.3)

2

·|

S

|

Data Mining with Decision Trees: Theory and Applications

Search WWH ::

Custom Search

Home