Database Reference
In-Depth Information
Table 4.6
McNemar's test: contingency table.
Number of examples misclassified
Number of examples misclassified by
f A but not by ˆ
ˆ
f B ( n 01 )
by both classifiers ( n 00 )
Number of examples misclassified
Number of examples misclassified
by ˆ
f B but not by ˆ
neither by ˆ
f A nor by ˆ
f A (
n 10 )
f B (
n 11 )
Table 4.7
Expected counts under H 0 .
n 00
( n 01 + n 10 ) / 2)
(
n 01 + n 10 ) / 2)
n 11 )
training set and the result is two classifiers. These classifiers are tested on
T and for each example x
T we record how it was classified. Thus, the
contingency table presented in Table 4.6 is constructed.
The two inducers should have the same error rate under the null
hypothesis H 0 . McNemar's test is based on a χ 2 test for goodness-of-fit
that compares the distribution of counts expected under null hypothesis
to the observed counts. The expected counts under H 0
are presented in
Table 4.7.
The following statistic, s , is distributed as χ 2 with 1 degree of freedom.
It incorporates a “continuity correction” term (of
1 in the numerator) to
account for the fact that the statistic is discrete while the χ 2
distribution
is continuous:
1) 2
s = (
|
n 10
n 01 |−
.
(4.23)
n 10 + n 01
According to the probabilistic theory [ Athanasopoulos, 1991 ] ,ifthenull
hypothesis is correct, the probability that the value of the statistic, s ,is
greater than χ 1 , 0 . 95 is less than 0 . 05, i.e. P (
1 , 0 . 95 ) < 0 . 05. Then, to
compare the inducers A and B, the induced classifiers f A and f B are tested
on T and the value of s is estimated as described above. Then if
|
s
|
1 , 0 . 95
,
the null hypothesis could be rejected in favor of the hypothesis that the two
inducers have different performance when trained on the particular training
set R .
The shortcomings of this test are:
|
s
|
(1) It does not directly measure variability due to the choice of the
training set or the internal randomness of the inducer. The inducers are
compared using a single training set R. Thus McNemar's test should
be only applied if we consider that the sources of variability are small.
Search WWH ::




Custom Search