CLASS IMBALANCE AND ACTIVE LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

space. Given a similarity metric between two points, sim (x, x ) , information

density selects examples according to:

U m (x) = U (x) 1

| X |

sim (x, x ) β

x ∈

X

Here, β is a hyper-parameter controlling the trade-off between the raw

instance-specific utility,

U (x) and the similarity component

in the overall

selection criterion.

Zhu et al. [13] developed a technique similar to the information density tech-

nique of Settles and Craven, selecting the instances according a uncertainty-based

criterion modified by a density factor:

(x) KNN (x) , where KNN (x) is

the average cosine similarity of the K nearest neighbors to x . The same authors

also propose the sampling by clustering , a density-only AL heuristic where the

problem space is clustered, and the points closest to the cluster centeroids are

selected for labeling.

U n (x)

= U

Pre-Clustering. Here it is assumed that the problem is expressed as a mixture

model comprising K distributions, each component model completely encoding

information related to the labels of member examples—the label y is condi-

tionally independent of the covariates x given knowledge of its cluster, k [14].

This assumption yields a joint distribution describing the problem: p(x,y,k) =

p(x | k)p(y | k)p(k) , yielding a poster probability on y :

K

p(y | k) p(x | k)p(k)

p(x)

p k (y | x) =

k

=

1

In essence, this a density-weighted mixture model used for classification. The

K clusters are created by a application of typical clustering techniques of the

data, with a cluster size used to estimate p(k) ,and p(y | k) is estimated via a

logistic regression considering a cluster's representative example. A probability

density is inferred for each cluster; in the example case presented in the earlier-

mentioned work, an isotropic normal distribution is used, from which p(x

k) can

be estimated. Examples are then selected from an uncertainty score computed via

the above-mentioned posterior model weighted by the probability of observing a

given x :

|

U k (x) = ( 1 −| p k (y | x) | ) p(x)

Of course, there exists a variety of other techniques in the research literature

designed to explicitly incorporate information related to the problem's density

into an active selection criterion. McCallum and Nigam [15] modify a query-

by-committee to use an exponentiated Kullback-Leibler (KL) divergence-based

uncertainty metric and combine this with semi-supervised learning in the form

Imbalanced Learning: Foundations, Algorithms, and Applications

Search WWH ::

Custom Search

Home