Database Reference
In-Depth Information
6.3 Optimal Pruning
The issue of finding the optimal pruning method has been studied by
Bratko and Bohanec (1994) and Almuallim (1996). The Optimal Pruning
Technique (OPT) algorithm guarantees optimality based on dynamic
programming, with the complexity of Θ(
| 2 ), where T is the
initial decision tree. Almuallim (1996) introduced an improvement of
OPT called OPT-2, which also performs optimal pruning using dynamic
programming. However, the time and space complexities of OPT-2 are both
Θ(
|
leaves ( T )
), where T is the target (pruned) decision tree
and T is the initial decision tree.
Since the pruned tree is habitually much smaller than the initial tree
and the number of internal nodes is smaller than the number of leaves,
OPT-2 is usually more ecient than OPT in terms of computational
complexity.
According to Almuallim, when pruning a given decision tree ( DT )
with s leaves, it is common to progressively replace various sub-trees
of DT by leaves, thus leading to a sequence of pruned DT, DT s− 1 ,
DT s− 2 ,..., DT i ,..., DT 1 , such that each DT i has at most i leaves. When
seeing the error as the difference from the original tree ( DT ), the trees in
the sequence are of increasing error. The goal in choosing the best tree from
the above sequence of pruned trees, according to some appropriate criteria,
is to achieve a good balance between the trees size and accuracy.
In their paper, Bratko and Bohanec (1994) address the following prob-
lem: “Given a completely accurate but complex definition of a concept,
simplify the definition, possibly at the expanse of accuracy, so that the
simplified definition still corresponds to the concept 'suciently' well”. In
this context, “concepts” are represented by decision trees and the method
of simplification is tree pruning. Therefore, the problem can be stated as:
“Given a decision tree that accurately specifies a concept, the problem is to
find the smallest pruned tree that still represents the concept within some
specified accuracy.”
In a new algorithm [Almuallim (1996)], OPT-2, which also performs
optimal pruning, is introduced. The problem of optimal pruning is formally
presented in this paper as follows: “Given a DT and a positive integer
C , find a pruned DT from DT such that s ( DT )
leaves ( T )
|
|·|
internal ( T )
|
C and error( DT )is
minimized.” where s ( DT )isthesizeof DT (number of leaves in DT )and
error( DT ) is the proportion of cases that are incorrectly handled by DT .
The OPT-2 algorithm is based on dynamic programming. In its most
basic form, the time and space complexities of OPT-2 are both Θ( nC ),
Search WWH ::




Custom Search