CLASS IMBALANCE AND ACTIVE LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

of this model. The feature values for each instance are then collected and the

complete instances are added to the training set. The model is reconstructed and

the processes is repeated until n examples are obtained (because the budget is

exhausted or some other stopping criterion is met, such as a computational limit).

Note that this situation can be considered to be a special case of the instance

completion setting of active feature-value acquisition (cf. [44]). It is a degenerate

special case because, before the selection, there is no information at all about the

instances other than their classes.

For the specific problem at the heart of ACS, the extreme lack of infor-

mation to guide selection leads to the development of unique uncertainty and

utility estimators, which, in the absence of predictive covariates, require unique

approximations. 10 While alternative approaches to ACS have emerged, for the-

matic clarity, uncertainty-based and expected-utility-based approaches will be

presented first. Note that because effective classification requires that both sides

of a prediction boundary be represented, unlike typical AL techniques, ACS

typically samples classes from their respective score distributions [45, 46].

6.8.1.1 Uncertainty-Based Approaches This family of techniques for

performing ACS is based on the volatility in the predictions made about certain

classes—those classes whose cross-validated predictions are subject to the most

change between successive epochs of instance selection are likely to be based

on an uncertain predictor and amenable to refinement by the incorporation of

additional training data [38, 40]. Analogous to the case of more traditional

uncertainty-based data acquisition, several heuristics have been devised to

capture the notion of variability.

One measure of the uncertainty of a learned model is how volatile its predictive

performance is in the face of new training data. Take a typical learning curve, for

instance, those presented in Figure 6.6. Notice that the modeling is much more

volatile at the left side of the figure, showing large changes in generalization

performance for the same amount of new training data. We can think that as

the predictor gains knowledge of the problem space, it tends to solidify in the

face of data, exhibiting less change and greater certainty. For ACS, we might

wonder if the learning curves will be equally steep regardless of the class of

the training data [38-40]. With this in mind, we can select instances at epoch

t from the classes in proportion to their improvements in accuracy at t − 1and

t − 2. For example, we could use cross-validation to estimate the generalization

performance of the classifier with respect to each class, A (c) ;class c can then

be sampled according to:

max 0 ,

2 (c)

t

−

1 (c)

t

−

A

− A

p t

A

(c) ∝

c max 0 , A

2 (c ) ,

t −

1 (c ) − A

t −

10 In realistic settings, for instance, such as potential application for ACS, guided learning, this lack

of information assumption may be softened.

Imbalanced Learning: Foundations, Algorithms, and Applications

Search WWH ::

Custom Search

Home