Information Technology Reference
In-Depth Information
where precision equals to recall. Precision is defined as TruePos ./( TruePos . +
FalsePos .) , and recall is defined as TruePos ./( TruePos . + FalseNeg .)
6.3.5 Experiments and Empirical Evaluation
We study the performance of the algorithm on various benchmark real-world
datasets, including MNIST, USPS, several categories of Reuters-21578 collection,
five topics from CiteSeer, and three datasets from the University of California,
Irvine (UCI) repository. The characteristics of the datasets are outlined in [19].
In the experiments, an early stopping heuristic for AL is employed, as it has been
shown that AL converges to the solution faster than the random sample selection
method [19]. A theoretically sound method to stop training is when the examples
in the margin are exhausted. To check whether there are still unseen training
examples in the margin, the distance of the newly selected example is compared
against the support vectors of the current model. If the newly selected example
by AL (closest to the hyperplane) is not closer than any of the support vectors,
it is concluded that the margin is exhausted. A practical implementation of this
idea is to count the number of support vectors during the AL training process. If
the number of the support vectors stabilizes, it implies that all possible support
vectors have been selected by the AL method.
As the first experiment, examples are randomly removed from the minor-
ity class in Adult dataset to achieve different data imbalance ratios, comparing
SVM-based AL and random sampling (RS). 4 For brevity, AL with small pools
is referred to as AL as the small pools heuristic is utilized for all AL meth-
ods considered later. Comparisons of PRBEP in Figure 6.4 show an interesting
behavior. As the class imbalance ratio is increased, AL curves display peaks in
the early steps of the learning. This implies that by using an early stopping cri-
teria, AL can give higher prediction performance than RS can possibly achieve
even after using all the training data. The learning curves presented in Figure 6.4
demonstrate that the addition of instances to a model's training after finding those
most informative instances can be detrimental to the prediction performance of
the classifier, as this may cause the model to suffer from overfitting. Figure 6.4
curves show that generalization can peak to a level above that can be achieved
by using all available training data. In other words, it is possible to achieve better
classification performance from a small informative subset of the training data
than what can be achieved using all available training data. This finding agrees
with that of Schohn and Cohn [23] and strengthens the idea of applying an early
stopping to AL algorithms.
For further comparison of the performance of a model built on all available
data (batch) and AL subject to early halting criteria, refer to Table 6.1, comparing
the g -means and the AUC values for these two methods. The data efficiency col-
umn for AL indicates that by processing only a portion of the examples from the
4 Here, the random process is assumed to be uniform; examples are selected with equal probability
from the available pool.
Search WWH ::




Custom Search