Database Reference
In-Depth Information
threshold. This procedure stops also when one of the following conditions is
fulfilled:
(1) Maximum tree depth is reached.
(2) Minimum number of cases in a node for being a parent is reached, so
it cannot be split any further.
(3) Minimum number of cases in a node for being a child node is reached.
CHAID handles missing values by treating them all as a single valid
category. CHAID does not perform pruning.
7.6 QUEST
The Quick, Unbiased, Ecient Statistical Tree (QUEST) algorithm sup-
ports univariate and linear combination splits [ Loh and Shih (1997) ] .For
each split, the association between each input attribute and the target
attribute is computed using the ANOVA F-test or Levene's test (for
ordinal and continuous attributes) or Pearson's chi-square (for nominal
attributes). An ANOVA F-statistic is computed for each attribute. If
the largest F-statistic exceeds a predefined threshold value, the attribute
with the largest F-value is selected to split the node. Otherwise, Levene's
test for unequal variances is computed for every attribute. If the largest
Levene's statistic value is greater than a predefined threshold value, the
attribute with the largest Levene value is used to split the node. If no
attribute exceeded either threshold, the node is split using the attribute
with the largest ANOVA F-value.
If the target attribute is multinomial, two-means clustering is used to
create two super-classes. The attribute that obtains the highest association
with the target attribute is selected for splitting. Quadratic Discriminant
Analysis (QDA) is applied to find the optimal splitting point for the input
attribute. QUEST has negligible bias and yields a binary decision tree.
Ten-fold cross-validation is used to prune the trees.
7.7 Reference to Other Algorithms
Table 7.1 describes other decision tree algorithms available in the literature.
Although there are many other algorithms which are not included in this
table, nevertheless, most are a variation of the algorithmic framework
presented in previous sections. A profound comparison of the above
algorithms and many others has been conducted in [ Lim et al . (2000) ] .
Search WWH ::




Custom Search