CLASS IMBALANCE AND ACTIVE LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

regions of minority and majority instances. Prior research has shown that such

“small disjuncts” can comprise a large portion of a target class in some domains

[37]. For AL, these small subconcepts act as same as rare classes: if a learner has

seen no instances of the subconcept, how can it “know” which instances to label?

Note that this is not simply a problem of using the wrong loss function: in an AL

setting, the learner does not even know that the instances of the subconcept are

misclassified if no instances of a subconcept have yet been labeled. Nonetheless,

in a research setting (where we know all the labels), using an undiscriminative

loss function, such as classification accuracy or even the AUROC, may result in

the researcher not even realizing that an important subconcept has been missed.

To demonstrate how small disjuncts influence (active) model learning, consider

the following text classification problem: separating the Science articles from

the non-Science articles within a subset of the 20 newsgroups benchmark set

(with an induced class skew of 80

1). Figure 6.11 examines graphically the

relative positions of the minority instances through the AL. The black curve

shows the AUC (right vertical axis) of the models learned by a logistic regression

classifier using uncertainty sampling, rescaled as follows. At each epoch, we sort

all instances by their predicted probability of membership in the majority class,

P(y = 0 | x) . The black dots in Figure 6.11 represent the minority class instances,

with the value on the left vertical axis showing their relative position in this

sorted list. The x -axis shows the AL epoch (here each epoch requests 30 new

instances from the pool). The black trajectories mostly show instances' relative

−

1

0.5

0

50

100

150

Epoch #

200

250

300

350

Figure 6.11 A comparison of the learned model's ordering of the instance pool along

with the quality of the cross-validated AUC.

Search WWH ::

Custom Search

Home