Database Reference
In-Depth Information
(2) The test does not measure variation due to the choice of the training
set or the internal variation of the learning algorithm. Also it measures
the performance of the algorithms on training sets of a size significantly
smaller than the whole dataset.
4.2.7.3
The Resampled Paired t Test
The resampled paired t test is the most popular in machine learning.
Usually, there are a series of 30 trials in the test. In each trial, the available
sample S is randomly divided into a training set R (it is typically two
thirds of the data) and a test set T . The algorithms A and B are both
trained on R and the resulting classifiers are tested on T .Let p ( i A and p ( i B
be the observed proportions of test examples misclassified by algorithm A
and B respectively during the i th trial. If we assume that the 30 differences
p ( i ) = p ( i A
p ( i B were drawn independently from a normal distribution, then
we can apply Student's t test by computing the statistic:
· n
P i =1 ( p ( i ) −p ) 2
n− 1
p
t =
,
(4.26)
where p = n · i =1
p ( i ) . Under the null hypothesis, this statistic has a
t distribution with n
1 degrees of freedom. Then for 30 trials, the null
hypothesis could be rejected if
|
t
|
>t 29 , 0 . 975 =2 . 045. The main drawbacks
of this approach are:
(1) Since p ( i A and p ( i B are not independent, the difference p ( i ) will not have
a normal distribution.
(2) The p ( i ) 's are not independent, because the test and training sets in
the trials overlap.
4.2.7.4
The k-fold Cross-validated Paired t Test
This approach is similar to the resampled paired t test except that instead of
constructing each pair of training and test sets by randomly dividing S ,the
dataset is randomly divided into k disjoint sets of equal size, T 1 ,T 2 ,...,T k .
Then k trials are conducted. In each trial, the test set is T i and the training
set is the union of all of the others T j , j
= i .The t statistic is computed
as described in Section 4.2.7.3. The advantage of this approach is that
each test set is independent of the others. However, there is the problem
that the training sets overlap. This overlap may prevent this statistical test
from obtaining a good estimation of the amount of variation that would
Search WWH ::




Custom Search