Information Technology Reference
In-Depth Information
of an expectation maximization (EM) procedure. This combined semi-supervised
AL has the benefit of ignoring regions that can be reliably “filled in” by a semi-
supervised procedure, while also selecting those examples that may benefit this
EM process.
Donmez et al. [16] propose a modification of the density-weighted technique
of Nguyen and Smeulders. This modification simply selects examples accord-
ing to the convex combination of the density-weighted technique and traditional
uncertainty sampling. This hybrid approach is again incorporated into a so-called
dual-active learner, where only uncertainty sampling is incorporated once the
benefits of pure density-sensitive sampling seem to be diminishing.
Alternate Density-Sensitive Heuristics. Donmez and Carbonell [17] incorporate
density into active label selection by performing a change of coordinates into
a space whose metric expresses not only Euclidian similarity but also den-
sity. Examples are then chosen based on a density-weighted uncertainty metric
designed to select examples in pairs—one member of the pair from each side of
the current decision boundary. The motivation is that sampling from both sides
of the decision boundary may yield better results than selecting from one side in
isolation.
Through selection based on an “unsupervised” heuristic estimating the util-
ity of label acquisition on the pool of unlabeled instances, Roy and McCal-
lum [9] incorporate the geometry of the problem space into active selection
implicitly. This approach attempts to quantify the improvement in model perfor-
mance attributable to each unlabeled example, taken in expectation over all label
assignments:
Y p(y | x)
X U e (x ; x,y = y )
U E =
y
x =
Here, the probability of class membership in the earlier-mentioned expectation
comes from the base model's current posterior estimates. The utility value on the
right side of the previous equation, U e (x ; x,y = y ) , comes from assuming a
label of y , for example, x , and incorporating this pseudo-labeled example into
the training set temporarily. The improvement in model performance with the
inclusion of this new example is then measured. Since a selective label acquisi-
tion procedure may result in a small or arbitrarily biased set of examples, accurate
evaluation through nested cross-validation is difficult. To accommodate this, Roy
and McCallum propose two uncertainty measures taken over the pool of unlabeled
examples, x = x . Specifically, they look at the entropy of the posterior proba-
bilities of examples in the pool, and the magnitude of the maximum posterior as
utility measures, both estimated after the inclusion of the “new” example. Both
metrics favor “sharp” posteriors, an optimization minimizing uncertainty rather
than model performance; instances are selected by their reduction in uncertainty
taken in expectation over the entire example pool.
Search WWH ::




Custom Search