Information Technology Reference
In-Depth Information
4.2.1 Bagging
The basic motivation of parallel ensemble methods is that error can be reduced
dramatically by combining independent base learners. Since it is difficult to get
totally independent base learners from the same set of training data in practice,
base learners with less dependency can be obtained by injecting randomness in
the learning process.
Bagging [9] is a representative of parallel ensemble method. The name is short
for “Bootstrap AGGregatING,” indicating two principal techniques of Bagging,
that is, bootstrap and aggregation. Bagging adopts bootstrap sampling [16] to
inject randomness into training data to generate base learners with less depen-
dency and more diversity. In detail, given the original training set of n examples
D ={ ( x i ,y i ) }
n
i
1 , a sample of n training examples will be generated by sam-
pling with replacement. Thus, some original training data appear several times.
By applying the sampling with replacement for T times, T subsets of n training
examples are obtained, which can be used to train T base learners. Bagging uses
the most popular combination method voting for classification and averaging for
regression to form an ensemble. The algorithm is shown previously. It is worth
noting that, because the base learners are independent, the computational cost of
Bagging can be further reduced by training base learners in parallel.
It is quite remarkable that, although the training data for different base learners
have diversity by injecting randomness, this does not necessarily lead to diverse
base learners. Some learning methods are very stable; they are insensitive to
perturbation on training samples, such as decision stumps and k -nearest neighbor.
The base learners generated by these methods are quite similar to each other; so
combining them will not improve generalization ability [15]. Therefore, Bagging
should be used with unstable learners, such as decision trees. Its generalization
ability is improved by reducing the variance through the smoothing effect [9].
And generally, the more unstable the base learners, the larger the performance
improvement.
=
4.2.2 AdaBoost
Boosting-style ensemble methods can boost weak learners into strong learners.
Weak learners are slightly better than random guess, while strong learners can
predict very accurately. The research of boosting derived from the work on an
interesting theoretical question posed by Kearns and Valiant [17]: whether the
weakly learnable problems equal the strongly learnable problems. If the answer
to the question is positive, the strongly learnable problems can be solved by
“boosting” a series of weak learners. Obviously, this question is of fundamental
importance to machine learning. Schapire [18] gave a constructive proof to this
question and this led to the first boosting algorithm. The basic idea of boosting
is to correct the mistakes of previous base learners. Specifically, suppose that
the first base learner h 1 trained from the original training data is a weak learner.
Boosting will derive a new distribution D from the true distribution D to focus
Search WWH ::




Custom Search