CLASS IMBALANCE AND ACTIVE LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

problem faced in ADS; feature-value/class associations factor quite differently

into the formation and training of machine learning models than more traditional

example labels. Nonetheless, approaches thematically similar to the ADS problem

have been developed, where features are selected to labeling according to some

notion of stability within the current model. As in traditional uncertainty selection

in AL, the primary difference among these uncertainty-based ADS techniques is

the manner by which feature uncertainty is estimated. In the simplest case, using

a linear model, the coefficients on particular feature values may be interpreted

as a measure of uncertainty, with lower coefficient magnitude corresponding to

a greater degree of uncertainty [59]. The Naıve Bayes-like model used presented

in [52] presents a more appealing option—feature-value label uncertainty can be

measured by looking at the magnitude of the log-odds of the feature-value like-

lihoods:

|

log p(f

|+

)/p(f

|−

)

|

for feature value f and classes

+

and

−

.Again,

a smaller value corresponds to increased uncertainty.

A range of other techniques for uncertainty-based AFL exist. By the cre-

ation of one-term pseudo-documents, Godbole et al [60] coerce the notion of

feature label uncertainty into a more traditional instance uncertainty framework

for text classification tasks. By incorporating the label information on each fea-

ture value with unlabeled examples, Druck et al. [58] create a corresponding

generalized expectation term that rates the model's predicted class distribution

conditioned on the presence of the particular feature. This rating penalizes these

predicted class distributions according to their KL-divergence from reference

distributions constructed using labeled features. Similarly, Liang et al. [61] learn

from labeled examples and actively selected constraints in the form of expec-

tations with some associated noise from particular examples. Druck et al. [62]

analyze several uncertainty-based selection techniques for gathering feature labels

when training conditional random fields, finding that the total uncertainty (mea-

sured as the sum of the marginal entropies) tends to favor more frequent features.

As a remedy, they propose an uncertainty scheme where the mean uncertainty is

weighted by the log of the counts of the associated feature values.

Interestingly, even for reasonable uncertainty estimators, feature/class uncer-

tainty may not be a desirable criterion for selection. Consider the discussion

made previously regarding the preponderance of uninformative features. Clearly,

in the case of document classification, terms such as “of” and “the” will sel-

dom have any class polarity. At the same time, these terms are likely to have

a high degree of uncertainty, leaving uncertainty-based approaches to perform

poorly in practice. Preferring to select features based on certainty , that is, select-

ing those features with the least uncertainty, seems to work much better in

practice [59, 63].

6.8.2.2 Expected Utility-Based AFL As with traditional uncertainty (and cer-

tainty) sampling for AL, the query corresponding to the greatest level of uncer-

tainty may not necessarily be the query offering the greatest level of information

to the model. This is particularly true in noisy or complex environments. Instead

of using a heuristic to estimate the information value of a particular feature label,

Imbalanced Learning: Foundations, Algorithms, and Applications

Search WWH ::

Custom Search

Home