Popular Decision Trees Induction Algorithms - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

threshold. This procedure stops also when one of the following conditions is

fulfilled:

(1) Maximum tree depth is reached.

(2) Minimum number of cases in a node for being a parent is reached, so

it cannot be split any further.

(3) Minimum number of cases in a node for being a child node is reached.

CHAID handles missing values by treating them all as a single valid

category. CHAID does not perform pruning.

7.6 QUEST

The Quick, Unbiased, Ecient Statistical Tree (QUEST) algorithm sup-

ports univariate and linear combination splits [ Loh and Shih (1997) ] .For

each split, the association between each input attribute and the target

attribute is computed using the ANOVA F-test or Levene's test (for

ordinal and continuous attributes) or Pearson's chi-square (for nominal

attributes). An ANOVA F-statistic is computed for each attribute. If

the largest F-statistic exceeds a predefined threshold value, the attribute

with the largest F-value is selected to split the node. Otherwise, Levene's

test for unequal variances is computed for every attribute. If the largest

Levene's statistic value is greater than a predefined threshold value, the

attribute with the largest Levene value is used to split the node. If no

attribute exceeded either threshold, the node is split using the attribute

with the largest ANOVA F-value.

If the target attribute is multinomial, two-means clustering is used to

create two super-classes. The attribute that obtains the highest association

with the target attribute is selected for splitting. Quadratic Discriminant

Analysis (QDA) is applied to find the optimal splitting point for the input

attribute. QUEST has negligible bias and yields a binary decision tree.

Ten-fold cross-validation is used to prune the trees.

7.7 Reference to Other Algorithms

Table 7.1 describes other decision tree algorithms available in the literature.

Although there are many other algorithms which are not included in this

table, nevertheless, most are a variation of the algorithmic framework

presented in previous sections. A profound comparison of the above

algorithms and many others has been conducted in [ Lim et al . (2000) ] .

Search WWH ::

Custom Search

Home