CLASS IMBALANCE AND ACTIVE LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

Given the option of incorporating prior knowledge associated with certain fea-

ture values into the predictive behavior of a model, the question then becomes:

which feature examples should be selected for labeling? Initially, this may seem

to be a deflection, replacing the search for useful information of one kind with

another. However, there are several reasons why seeking feature values to “label”

may be preferred to labeling examples. The most obvious reason for selecting

feature values for labeling is that traditional is often “slow.” It may take many

labeled movie reviews to teach the model that “amazing” has a positive associa-

tion, and not the other uninformative terms that just happen to occur in positive

reviews by coincidence. In this way, the cold start problem faced in AL 12 may

be less acute in AFL and ADS—while complex feature/class relationships may

be difficult to achieve using AFL; reasonable generalization performance is often

achievable with few requests to an oracle. Second, the labor costs associated with

assigning a class polarity to a certain feature value is often quite low—it is easy

for a human to associate the term terrible with negative movie reviews, while

labeling one particular movie review as positive or negative requires reading the

entire document. Note that of course not every term is polar; in the ongoing

example of movie reviews, it is easy to imagine that a few terms have a natural

association with the positive or negative class, while most terms on their own

do not have such a polarity. However, this imbalance between polar and non-

polar feature values is often far less acute than the imbalance between classes

in many machine learning problems. A problem domain with a class imbalance

of 1 , 000 , 000 : 1 may still have 1 in 10 features exhibiting some meaningful

class association. Perhaps even more importantly, the ratio between positively

and negatively linked feature values (for instance) may be far more balanced

than the ratio between those classes in the wild. In fact, there is not necessarily a

relationship between the base rate and the ratio of strongly identifiable positively

and negatively associated feature values. While selecting useful feature values in

practice is still often a challenging problem, experience has shown that it is often

more informative to select random features for labeling than random examples.

More intelligent selection heuristics can make this preference for AFL and ADS

even stronger.

Again, giving a thorough survey of selection heuristics for performing AFL

is beyond the scope of this chapter. However, we will provide the reader with

a brief overview of the techniques typically employed for such tasks. As in

many selective data acquisition tasks for machine learning, we see two common

themes: uncertainty-based selection and expected-utility-based approaches. In the

following, we will briefly present some more popular techniques for AFL delin-

eated accordingly. We will then briefly discuss the techniques for ADS, selecting

features and examples for labeling simultaneously.

6.8.2.1 Uncertainty-Based AFL By far the most prevalent class of AL heuris-

tics, uncertainty-based approaches do not have a direct analogue to the selection

12 Recall Section 6.7

Imbalanced Learning: Foundations, Algorithms, and Applications

Search WWH ::

Custom Search

Home