Information Technology Reference
In-Depth Information
space. Given a similarity metric between two points, sim
(x, x
)
, information
density selects examples according to:
U
m
(x)
=
U
(x)
1
|
X
|
sim
(x, x
)
β
x
∈
X
Here,
β
is a hyper-parameter controlling the trade-off between the raw
instance-specific utility,
U
(x)
and the similarity component
in the overall
selection criterion.
Zhu et al. [13] developed a technique similar to the information density tech-
nique of Settles and Craven, selecting the instances according a uncertainty-based
criterion modified by a density factor:
(x)
KNN
(x)
, where KNN
(x)
is
the average cosine similarity of the
K
nearest neighbors to
x
. The same authors
also propose the
sampling by clustering
, a density-only AL heuristic where the
problem space is clustered, and the points closest to the cluster centeroids are
selected for labeling.
U
n
(x)
=
U
Pre-Clustering.
Here it is assumed that the problem is expressed as a mixture
model comprising
K
distributions, each component model completely encoding
information related to the labels of member examples—the label
y
is condi-
tionally independent of the covariates
x
given knowledge of its cluster,
k
[14].
This assumption yields a joint distribution describing the problem:
p(x,y,k)
=
p(x
|
k)p(y
|
k)p(k)
, yielding a poster probability on
y
:
K
p(y
|
k)
p(x
|
k)p(k)
p(x)
p
k
(y
|
x)
=
k
=
1
In essence, this a density-weighted mixture model used for classification. The
K
clusters are created by a application of typical clustering techniques of the
data, with a cluster size used to estimate
p(k)
,and
p(y
|
k)
is estimated via a
logistic regression considering a cluster's representative example. A probability
density is inferred for each cluster; in the example case presented in the earlier-
mentioned work, an isotropic normal distribution is used, from which
p(x
k)
can
be estimated. Examples are then selected from an uncertainty score computed via
the above-mentioned posterior model weighted by the probability of observing a
given
x
:
|
U
k
(x)
=
(
1
−|
p
k
(y
|
x)
|
) p(x)
Of course, there exists a variety of other techniques in the research literature
designed to explicitly incorporate information related to the problem's density
into an active selection criterion. McCallum and Nigam [15] modify a query-
by-committee to use an exponentiated Kullback-Leibler (KL) divergence-based
uncertainty metric and combine this with semi-supervised learning in the form
Search WWH ::
Custom Search