ENSEMBLE METHODS FOR CLASS IMBALANCE LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

4.2.1 Bagging

The basic motivation of parallel ensemble methods is that error can be reduced

dramatically by combining independent base learners. Since it is difficult to get

totally independent base learners from the same set of training data in practice,

base learners with less dependency can be obtained by injecting randomness in

the learning process.

Bagging [9] is a representative of parallel ensemble method. The name is short

for “Bootstrap AGGregatING,” indicating two principal techniques of Bagging,

that is, bootstrap and aggregation. Bagging adopts bootstrap sampling [16] to

inject randomness into training data to generate base learners with less depen-

dency and more diversity. In detail, given the original training set of n examples

D ={ ( x i ,y i ) }

n

i

1 , a sample of n training examples will be generated by sam-

pling with replacement. Thus, some original training data appear several times.

By applying the sampling with replacement for T times, T subsets of n training

examples are obtained, which can be used to train T base learners. Bagging uses

the most popular combination method voting for classification and averaging for

regression to form an ensemble. The algorithm is shown previously. It is worth

noting that, because the base learners are independent, the computational cost of

Bagging can be further reduced by training base learners in parallel.

It is quite remarkable that, although the training data for different base learners

have diversity by injecting randomness, this does not necessarily lead to diverse

base learners. Some learning methods are very stable; they are insensitive to

perturbation on training samples, such as decision stumps and k -nearest neighbor.

The base learners generated by these methods are quite similar to each other; so

combining them will not improve generalization ability [15]. Therefore, Bagging

should be used with unstable learners, such as decision trees. Its generalization

ability is improved by reducing the variance through the smoothing effect [9].

And generally, the more unstable the base learners, the larger the performance

improvement.

=

4.2.2 AdaBoost

Boosting-style ensemble methods can boost weak learners into strong learners.

Weak learners are slightly better than random guess, while strong learners can

predict very accurately. The research of boosting derived from the work on an

interesting theoretical question posed by Kearns and Valiant [17]: whether the

weakly learnable problems equal the strongly learnable problems. If the answer

to the question is positive, the strongly learnable problems can be solved by

“boosting” a series of weak learners. Obviously, this question is of fundamental

importance to machine learning. Schapire [18] gave a constructive proof to this

question and this led to the first boosting algorithm. The basic idea of boosting

is to correct the mistakes of previous base learners. Specifically, suppose that

the first base learner h 1 trained from the original training data is a weak learner.

Boosting will derive a new distribution D from the true distribution D to focus

Imbalanced Learning: Foundations, Algorithms, and Applications

Search WWH ::

Custom Search

Home