CLASS IMBALANCE AND ACTIVE LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

task is to differentiate sports web pages from nonsports pages. Depending on

the source of the data (e.g., different impression streams from different online

advertisers), one could see very different degrees of class skew in the population

of relevant web pages. The panels in Figure 6.10, left-to-right, depict increasing

amounts of induced class skew. On the far left, we see that for a balanced

class distribution, uncertainty sampling is indeed better than RS. For a 10 : 1

distribution, uncertainty sampling has some problems very early on, but soon

does better than RS—even more so than in the balanced case. However, as

the skew begins to get large, not only does RS start to fail (it finds fewer and

fewer minority instances, and its learning suffers), uncertainty sampling does

substantially worse than random for a considerable amount labeling expenditure.

In the most extreme case shown, 6 both RS and uncertainty sampling simply fail

completely. RS effectively does not select any positive examples, and neither

does uncertainty sampling. 7

A practitioner well versed in the AL literature may decide he/she should use

a method other than uncertainty sampling in such a highly skewed domain. A

variety of techniques have been discussed in Sections 6.2-6.4 for performing

AL specifically under class imbalance, including [18-21, 35], as well as for

performing density-sensitive AL, where the geometry of the problem space is

specifically included when making selections, including [13-15, 17, 36]. While

initially appealing, as problems become increasingly difficult, these techniques

may not provide results better than more traditional AL techniques—indeed class

skews may be sufficiently high to thwart these techniques completely [33].

As discussed later in Section 6.8.1, Attenberg and Provost [33] proposed

an alternative way of using human resources to produce labeled training

set, specifically tasking people with finding class-specific instances (“guided

learning”) as opposed to labeling specific instances. In some domains, finding

such instances may even be cheaper than labeling (per instance). Guided learning

can be much more effective per instance acquired; in one of the Attenberg and

Provost's experiments, it outperformed AL as long as searching for class-specific

instances was less than eight times more expensive (per instance) than labeling

selected instances. The generalization performance of guided learning is shown

in Figure 6.12, discussed in Section 6.8.1 for the same setting as Figure 6.10.

6.6 DEALING WITH DISJUNCTIVE CLASSES

Even more subtly still, certain problem spaces may not have such an extreme

class skew, but may still be particularly difficult because they possess important

but very small disjunctive subconcepts, rather than simple continuously dense

6 10,000 : 1—still orders of magnitude less skewed than some categories.

7 The curious behavior of AUC < 0 . 5 here is due to overfitting. Regularizing the logistic regression

“fixes” the problem, and the curve hovers about 0 . 5. See another article in this issue for more insight

on models exhibiting AUC < 0 . 5 [34].

Search WWH ::

Custom Search

Home