Databases Reference
In-Depth Information
Alternatively, prepruning and postpruning may be interleaved for a combined
approach. Postpruning requires more computation than prepruning, yet generally leads
to a more reliable tree. No single pruning method has been found to be superior over
all others. Although some pruning methods do depend on the availability of additional
data for pruning, this is usually not a concern when dealing with large databases.
Although pruned trees tend to be more compact than their unpruned counterparts,
they may still be rather large and complex. Decision trees can suffer from repetition
and replication (Figure 8.7), making them overwhelming to interpret. Repetition occurs
when an attribute is repeatedly tested along a given branch of the tree (e.g., “age
<
60?,”
A 1
60?
yes
no
A 1
45?
yes
no
A 1
50?
yes
no
class B
class A
(a)
age
youth?
yes
no
student?
credit_rating?
yes
no
excellent
fair
class B
income?
class A
credit_rating?
excellent
fair
low med
high
income?
class A
class B
class A
class C
low
med
high
class B
class A
class C
(b)
Figure 8.7 An example of: (a) subtree repetition , where an attribute is repeatedly tested along a given
branch of the tree (e.g., age ) and (b) subtree replication , where duplicate subtrees exist
within a tree (e.g., the subtree headed by the node “ credit rating? ”).
 
Search WWH ::




Custom Search