A Generic Algorithm for Top-Down Induction of Decision Trees - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

In order to use the above correction, the values of p and k should be selected.

It is possible to use p =1 /

|

dom ( y )

|

and k =

|

dom ( y )

|

.Someareusing k =2

and p =1 / 2 in any case even if

> 2 in order to emphasize the fact

that the estimated event is always compared to the opposite event, while

others are using k =

|

dom ( y )

|

dom ( y )

|

/

|

S

|

and p =1 /

|

dom ( y )

|

.

3.4.2

No Match

According to Clark and Niblett (1989), only zero probabilities are corrected

and replaced by the following value: p a /

.Kohavi et al . (1997) suggest

using p a =0 . 5. They also empirically compared the Laplace correction and

the no-match correction and indicate that there is no significant difference

between them. However, both of them are significantly better than not

performing any correction at all.

|

S

|

3.5 Algorithmic Framework for Decision Trees

Decision tree inducers are algorithms that automatically construct a

decision tree from a given dataset. Typically, the goal is to find the optimal

decision tree by minimizing the generalization error. However, other target

functions can be also defined, for instance, minimizing the number of nodes

or minimizing the average depth of the tree.

Induction of an optimal decision tree from a given data is considered

to be a dicult task. Hancock et al . (1996) have shown that finding a

minimal decision tree consistent with the training set is NP-hard while

Hyafil and Rivest (1976) have demonstrated that constructing a minimal

binary tree with respect to the expected number of tests required for

classifying an unseen instance is NP-complete. Even finding the minimal

equivalent decision tree for a given decision tree [ Zantema and Bodlaender

(2000) ] or building the optimal decision tree from decision tables is known

to be NP-hard [ Naumov (1991) ] .

These results indicate that using optimal decision tree algorithms

is feasible only in small problems. Consequently, heuristics methods are

required for solving the problem. Roughly speaking, these methods can be

divided into two groups: top-down and bottom-up with clear preference in

the literature to the first group.

There are various top-down decision trees inducers such as ID3

[ Quinlan (1986) ] ,C4.5 [ Quinlan (1993) ] ,CART [ Breiman et al . (1984) ] .

Some inducers consist of two conceptual phases: Growing and Pruning (C4.5

and CART). Other inducers perform only the growing phase.

Figure 3.1 presents a typical pseudo code for a top-down inducing

algorithm of a decision tree using growing and pruning. Note that these

Data Mining with Decision Trees: Theory and Applications

Search WWH ::

Custom Search

Home