CLASS IMBALANCE AND ACTIVE LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

where precision equals to recall. Precision is defined as TruePos ./( TruePos . +

FalsePos .) , and recall is defined as TruePos ./( TruePos . + FalseNeg .)

6.3.5 Experiments and Empirical Evaluation

We study the performance of the algorithm on various benchmark real-world

datasets, including MNIST, USPS, several categories of Reuters-21578 collection,

five topics from CiteSeer, and three datasets from the University of California,

Irvine (UCI) repository. The characteristics of the datasets are outlined in [19].

In the experiments, an early stopping heuristic for AL is employed, as it has been

shown that AL converges to the solution faster than the random sample selection

method [19]. A theoretically sound method to stop training is when the examples

in the margin are exhausted. To check whether there are still unseen training

examples in the margin, the distance of the newly selected example is compared

against the support vectors of the current model. If the newly selected example

by AL (closest to the hyperplane) is not closer than any of the support vectors,

it is concluded that the margin is exhausted. A practical implementation of this

idea is to count the number of support vectors during the AL training process. If

the number of the support vectors stabilizes, it implies that all possible support

vectors have been selected by the AL method.

As the first experiment, examples are randomly removed from the minor-

ity class in Adult dataset to achieve different data imbalance ratios, comparing

SVM-based AL and random sampling (RS). 4 For brevity, AL with small pools

is referred to as AL as the small pools heuristic is utilized for all AL meth-

ods considered later. Comparisons of PRBEP in Figure 6.4 show an interesting

behavior. As the class imbalance ratio is increased, AL curves display peaks in

the early steps of the learning. This implies that by using an early stopping cri-

teria, AL can give higher prediction performance than RS can possibly achieve

even after using all the training data. The learning curves presented in Figure 6.4

demonstrate that the addition of instances to a model's training after finding those

most informative instances can be detrimental to the prediction performance of

the classifier, as this may cause the model to suffer from overfitting. Figure 6.4

curves show that generalization can peak to a level above that can be achieved

by using all available training data. In other words, it is possible to achieve better

classification performance from a small informative subset of the training data

than what can be achieved using all available training data. This finding agrees

with that of Schohn and Cohn [23] and strengthens the idea of applying an early

stopping to AL algorithms.

For further comparison of the performance of a model built on all available

data (batch) and AL subject to early halting criteria, refer to Table 6.1, comparing

the g -means and the AUC values for these two methods. The data efficiency col-

umn for AL indicates that by processing only a portion of the examples from the

4 Here, the random process is assumed to be uniform; examples are selected with equal probability

from the available pool.

Search WWH ::

Custom Search

Home