Information Technology Reference
In-Depth Information
problem faced in ADS; feature-value/class associations factor quite differently
into the formation and training of machine learning models than more traditional
example labels. Nonetheless, approaches thematically similar to the ADS problem
have been developed, where features are selected to labeling according to some
notion of stability within the current model. As in traditional uncertainty selection
in AL, the primary difference among these uncertainty-based ADS techniques is
the manner by which feature uncertainty is estimated. In the simplest case, using
a linear model, the coefficients on particular feature values may be interpreted
as a measure of uncertainty, with lower coefficient magnitude corresponding to
a greater degree of uncertainty [59]. The Naıve Bayes-like model used presented
in [52] presents a more appealing option—feature-value label uncertainty can be
measured by looking at the magnitude of the log-odds of the feature-value like-
lihoods:
|
log p(f
|+
)/p(f
|−
)
|
for feature value f and classes
+
and
.Again,
a smaller value corresponds to increased uncertainty.
A range of other techniques for uncertainty-based AFL exist. By the cre-
ation of one-term pseudo-documents, Godbole et al [60] coerce the notion of
feature label uncertainty into a more traditional instance uncertainty framework
for text classification tasks. By incorporating the label information on each fea-
ture value with unlabeled examples, Druck et al. [58] create a corresponding
generalized expectation term that rates the model's predicted class distribution
conditioned on the presence of the particular feature. This rating penalizes these
predicted class distributions according to their KL-divergence from reference
distributions constructed using labeled features. Similarly, Liang et al. [61] learn
from labeled examples and actively selected constraints in the form of expec-
tations with some associated noise from particular examples. Druck et al. [62]
analyze several uncertainty-based selection techniques for gathering feature labels
when training conditional random fields, finding that the total uncertainty (mea-
sured as the sum of the marginal entropies) tends to favor more frequent features.
As a remedy, they propose an uncertainty scheme where the mean uncertainty is
weighted by the log of the counts of the associated feature values.
Interestingly, even for reasonable uncertainty estimators, feature/class uncer-
tainty may not be a desirable criterion for selection. Consider the discussion
made previously regarding the preponderance of uninformative features. Clearly,
in the case of document classification, terms such as “of” and “the” will sel-
dom have any class polarity. At the same time, these terms are likely to have
a high degree of uncertainty, leaving uncertainty-based approaches to perform
poorly in practice. Preferring to select features based on certainty , that is, select-
ing those features with the least uncertainty, seems to work much better in
practice [59, 63].
6.8.2.2 Expected Utility-Based AFL As with traditional uncertainty (and cer-
tainty) sampling for AL, the query corresponding to the greatest level of uncer-
tainty may not necessarily be the query offering the greatest level of information
to the model. This is particularly true in noisy or complex environments. Instead
of using a heuristic to estimate the information value of a particular feature label,
Search WWH ::




Custom Search