Evaluation of Classification Trees - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

Table 4.6

McNemar's test: contingency table.

Number of examples misclassified

Number of examples misclassified by

f A but not by ˆ

f B ( n 01 )

by both classifiers ( n 00 )

Number of examples misclassified

by ˆ

f B but not by ˆ

neither by ˆ

f A nor by ˆ

f A (

n 10 )

f B (

n 11 )

Table 4.7

Expected counts under H 0 .

n 00

( n 01 + n 10 ) / 2)

(

n 01 + n 10 ) / 2)

n 11 )

training set and the result is two classifiers. These classifiers are tested on

T and for each example x

T we record how it was classified. Thus, the

contingency table presented in Table 4.6 is constructed.

The two inducers should have the same error rate under the null

hypothesis H 0 . McNemar's test is based on a χ 2 test for goodness-of-fit

that compares the distribution of counts expected under null hypothesis

to the observed counts. The expected counts under H 0

∈

are presented in

Table 4.7.

The following statistic, s , is distributed as χ 2 with 1 degree of freedom.

It incorporates a “continuity correction” term (of

1 in the numerator) to

account for the fact that the statistic is discrete while the χ 2

−

distribution

is continuous:

1) 2

s = (

n 10 −

n 01 |−

(4.23)

n 10 + n 01

According to the probabilistic theory [ Athanasopoulos, 1991 ] ,ifthenull

hypothesis is correct, the probability that the value of the statistic, s ,is

greater than χ 1 , 0 . 95 is less than 0 . 05, i.e. P (

>χ 1 , 0 . 95 ) < 0 . 05. Then, to

compare the inducers A and B, the induced classifiers f A and f B are tested

on T and the value of s is estimated as described above. Then if

>χ 1 , 0 . 95

the null hypothesis could be rejected in favor of the hypothesis that the two

inducers have different performance when trained on the particular training

set R .

The shortcomings of this test are:

(1) It does not directly measure variability due to the choice of the

training set or the internal randomness of the inducer. The inducers are

compared using a single training set R. Thus McNemar's test should

be only applied if we consider that the sources of variability are small.

Data Mining with Decision Trees: Theory and Applications

Search WWH ::

Custom Search

Home