CLASS IMBALANCE AND ACTIVE LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

is to reduce the version space as fast as possible to reach the solution faster

in order to avoid certain costs associated with the problem. For the possibility

of a nonsymmetric version space, there are more complex selection methods

suggested by Tong and Koller [8], but the advantage of those methods are not

significant, considering their high computational costs.

6.3.2.1 Active Learning with Small Pools The basic working principle of

margin-based AL with SVMs is: (i) train an SVM on the existing training data,

(ii) select the closest example to the hyperplane, and (iii) add the new selected

example to the training set and train again. In classical AL [8], the search for the

most informative example is performed over the entire dataset. Note that, each

iteration of AL involves the recomputation of each training example's distance

to the new hyperplane. Therefore, for large datasets, searching the entire training

set is a very time-consuming and computationally expensive task.

One possible remedy for this performance bottleneck is to use the “59 trick”

[27], which alleviates a full search through the entire dataset, approximating the

most informative examples by examining a small constant number of randomly

chosen samples. The method picks L ( L

# training examples) random training

samples in each iteration and selects the best (closest to the hyperplane) among

them. Suppose, instead of picking the closest example among all the training

samples X N = (x 1 ,x 2 ,...,x N ) at each iteration, we first pick a random subset

X L , L N and select the closest sample x i from X L based on the condition

that x i is among the top p % closest instances in X N with probability ( 1 − η) .

Any numerical modification to these constraints can be met by varying the size

of L , and is independent of N . To demonstrate this, the probability that at least

one of the L instances is among the closest p is 1 − ( 1 − p) L . Owing to the

requirement of ( 1 − η) probability, we have

1 − ( 1 − p) L

= 1 − η

(6.6)

which follows the solution of L in terms of η and p

log η

log ( 1 − p)

L

=

(6.7)

For example, the active learner will pick one example, with 95% probability, that

is among the top 5% closest instances to the hyperplane, by randomly sampling

only log ( 0 . 05 )/ log ( 0 . 95 ) = 59 examples regardless of the training set size.

This approach scales well since the size of the subset L is independent of the

training set size N , requires significantly less training time, and does not have

an adverse effect on the classification performance of the learner.

6.3.3 Active Learning with Online Learning

Online learning algorithms are usually associated with problems where the com-

plete training set is not available. However, in cases where the complete training

Imbalanced Learning: Foundations, Algorithms, and Applications

Search WWH ::

Custom Search

Home