CLASS IMBALANCE AND ACTIVE LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

of an expectation maximization (EM) procedure. This combined semi-supervised

AL has the benefit of ignoring regions that can be reliably “filled in” by a semi-

supervised procedure, while also selecting those examples that may benefit this

EM process.

Donmez et al. [16] propose a modification of the density-weighted technique

of Nguyen and Smeulders. This modification simply selects examples accord-

ing to the convex combination of the density-weighted technique and traditional

uncertainty sampling. This hybrid approach is again incorporated into a so-called

dual-active learner, where only uncertainty sampling is incorporated once the

benefits of pure density-sensitive sampling seem to be diminishing.

Alternate Density-Sensitive Heuristics. Donmez and Carbonell [17] incorporate

density into active label selection by performing a change of coordinates into

a space whose metric expresses not only Euclidian similarity but also den-

sity. Examples are then chosen based on a density-weighted uncertainty metric

designed to select examples in pairs—one member of the pair from each side of

the current decision boundary. The motivation is that sampling from both sides

of the decision boundary may yield better results than selecting from one side in

isolation.

Through selection based on an “unsupervised” heuristic estimating the util-

ity of label acquisition on the pool of unlabeled instances, Roy and McCal-

lum [9] incorporate the geometry of the problem space into active selection

implicitly. This approach attempts to quantify the improvement in model perfor-

mance attributable to each unlabeled example, taken in expectation over all label

assignments:

Y p(y | x)

X U e (x ; x,y = y )

U E =

y ∈

x =

Here, the probability of class membership in the earlier-mentioned expectation

comes from the base model's current posterior estimates. The utility value on the

right side of the previous equation, U e (x ; x,y = y ) , comes from assuming a

label of y , for example, x , and incorporating this pseudo-labeled example into

the training set temporarily. The improvement in model performance with the

inclusion of this new example is then measured. Since a selective label acquisi-

tion procedure may result in a small or arbitrarily biased set of examples, accurate

evaluation through nested cross-validation is difficult. To accommodate this, Roy

and McCallum propose two uncertainty measures taken over the pool of unlabeled

examples, x = x . Specifically, they look at the entropy of the posterior proba-

bilities of examples in the pool, and the magnitude of the maximum posterior as

utility measures, both estimated after the inclusion of the “new” example. Both

metrics favor “sharp” posteriors, an optimization minimizing uncertainty rather

than model performance; instances are selected by their reduction in uncertainty

taken in expectation over the entire example pool.

Imbalanced Learning: Foundations, Algorithms, and Applications

Search WWH ::

Custom Search

Home