Database Reference
In-Depth Information
5.1.13
Kolmogorov-Smirnov Criterion
A binary criterion that uses Kolmogorov-Smirnov distance has been
proposed by [ Friedman (1977) ] and [ Rounds (1980) ] . Assuming a binary
target attribute, namely dom ( y )=
{
c 1 ,c 2 }
, the criterion is defined as:
KS ( a i ,dom 1 ( a i ) ,dom 2 ( a i ) ,S )
=
σ a i ∈dom 1 ( a i ) AND y = c 1 S
|
σ a i ∈dom 1 ( a i ) AND y = c 2 S
|
(5.14)
.
σ y = c 1 S
|
σ y = c 2 S
|
This measure was extended by [Utgoff and Clouse (1996)] to handle
target attribute with multiple classes and missing data values. Their results
indicate that the suggested method outperforms the gain ratio criteria.
5.1.14
AUC Splitting Criteria
In Section 4.2.6.6, we have shown that the AUC metric can be used for
evaluating the predictive performance of classifiers. The AUC metric can
be also used as a splitting criterion [Ferri et al . (2002)]. The attribute
that obtains the maximal area under the convex hull of the ROC curve
is selected. It has been shown that the AUC-based splitting criterion
outperforms other splitting criteria both with respect to classification
accuracy and area under the ROC curve. It is important to note that unlike
impurity criteria, this criterion does not perform a comparison between the
impurity of the parent node with the weighted impurity of the children after
splitting.
5.1.15
Other Univariate Splitting Criteria
Additional univariate splitting criteria can be found in the literature, such
as permutation statistic [ Li and Dubes (1986) ] ; mean posterior improvement
[ Taylor and Silverman (1993) ] ; and hypergeometric distribution measure
[ Martin (1997) ] .
5.1.16
Comparison of Univariate Splitting Criteria
Over the past 30 years, several researchers have conducted comparative
studies of splitting criteria both those described above and others. Among
these researchers are: [ Breiman (1996) ] ; [ Baker and Jain (1976) ] ; [ Mingers
(1989) ] ; [ Fayyad and Irani (1992) ] ; [ Buntine and Niblett (1992) ] ; [ Loh and
Shih (1997) ] ; [ Loh and Shih (1999) ] ;and [ Lim et al . (2000) ] .Themajority
Search WWH ::




Custom Search