Information Technology Reference
In-Depth Information
space. Given a similarity metric between two points, sim (x, x ) , information
density selects examples according to:
U m (x) = U (x) 1
| X |
sim (x, x ) β
x
X
Here, β is a hyper-parameter controlling the trade-off between the raw
instance-specific utility,
U (x) and the similarity component
in the overall
selection criterion.
Zhu et al. [13] developed a technique similar to the information density tech-
nique of Settles and Craven, selecting the instances according a uncertainty-based
criterion modified by a density factor:
(x) KNN (x) , where KNN (x) is
the average cosine similarity of the K nearest neighbors to x . The same authors
also propose the sampling by clustering , a density-only AL heuristic where the
problem space is clustered, and the points closest to the cluster centeroids are
selected for labeling.
U n (x)
= U
Pre-Clustering. Here it is assumed that the problem is expressed as a mixture
model comprising K distributions, each component model completely encoding
information related to the labels of member examples—the label y is condi-
tionally independent of the covariates x given knowledge of its cluster, k [14].
This assumption yields a joint distribution describing the problem: p(x,y,k) =
p(x | k)p(y | k)p(k) , yielding a poster probability on y :
K
p(y | k) p(x | k)p(k)
p(x)
p k (y | x) =
k
=
1
In essence, this a density-weighted mixture model used for classification. The
K clusters are created by a application of typical clustering techniques of the
data, with a cluster size used to estimate p(k) ,and p(y | k) is estimated via a
logistic regression considering a cluster's representative example. A probability
density is inferred for each cluster; in the example case presented in the earlier-
mentioned work, an isotropic normal distribution is used, from which p(x
k) can
be estimated. Examples are then selected from an uncertainty score computed via
the above-mentioned posterior model weighted by the probability of observing a
given x :
|
U k (x) = ( 1 −| p k (y | x) | ) p(x)
Of course, there exists a variety of other techniques in the research literature
designed to explicitly incorporate information related to the problem's density
into an active selection criterion. McCallum and Nigam [15] modify a query-
by-committee to use an exponentiated Kullback-Leibler (KL) divergence-based
uncertainty metric and combine this with semi-supervised learning in the form
Search WWH ::




Custom Search