Database Reference
In-Depth Information
(2) It compares the performance of the inducers on training sets, which
are substantially smaller than the size of the whole dataset. Hence, we
must assume that the relative difference observed on training sets will
still hold for training sets of size equal to the whole dataset.
4.2.7.2
A Test for the Difference of Two Proportions
This statistical test is based on measuring the difference between the
error rates of algorithms A and B [ Snedecor and Cochran (1989) ] .More
specifically, let p A =( n 00 + n 01 ) /n be the proportion of test examples
incorrectly classified by algorithm A and let p B =( n 00 + n 10 ) /n be the
proportion of test examples incorrectly classified by algorithm B. The
assumption underlying this statistical test is that when algorithm A classi-
fies an example x from the test set T, the probability of misclassification is
p A . Then the number of misclassifications of n test examples is a binomial
random variable with mean np A and variance p A (1
p A ) n .
The binomial distribution can be well approximated by a normal distri-
bution for reasonable values of n . The difference between two independent
normally distributed random variables is itself normally distributed. Thus,
the quantity p A
p B can be viewed as normally distributed if we assume
that the measured error rates p A and p B are independent. Under the null
hypothesis, H 0 ,thequantity p A
p B has a mean of zero and a standard
deviation error of
1
n,
p A + p B
2
se =
2 p
·
(4.24)
where n is the number of test examples.
Based on the above analysis, we obtain the statistic:
p A
p B
z =
2 p (1
p ) /n ,
(4.25)
which has a standard normal distribution. According to the probabilistic
theory, if the z value is greater than Z 0 . 975 , the probability of incorrectly
rejecting the null hypothesis is less than 0.05. Thus, if |z| >Z 0 . 975 =1 . 96,
the null hypothesis could be rejected in favor of the hypothesis that the
two algorithms have different performances. Two of the most important
problems with this statistic are:
(1) The probabilities p A and p B are measured on the same test set and
thus they are not independent.
Search WWH ::




Custom Search