Information Technology Reference
In-Depth Information
from the training data. Many efforts have been devoted to design methods that
can generate learners with strong generalization ability. Ensemble learning is
one of the most successful paradigms. Different from ordinary machine learning
methods (which usually generate one single learner), ensemble methods train a
set of base learners from training data to make predictions with each one of
them, and then combine these predictions to give the final decision.
The most amazing part of the ensemble is that it can boost learners with
slightly better performance than random guess into learners with strong gener-
alization ability. Thus, the “base learners” are often referred as weak learners .
This also indicates that in ensemble methods, the “base learners” can have weak
generalization ability. Actually, most learning algorithms, such as decision trees,
NN, or other machine learning methods, can be invoked to train “base learners,”
and ensemble methods can boost the performance.
According to how the base learners are generated, ensemble methods can be
roughly categorized into two paradigms: parallel ensemble methods and sequen-
tial ensemble methods. Parallel ensemble methods generate base learners in
parallel, with Bagging [9] as a representative. Sequential ensemble methods
generate base learners in sequential, where a former base learner has influence
on the generation of subsequent learners, with AdaBoost [10] as a represen-
tative. We will briefly introduce Bagging and AdaBoost in Sections 4.2.1 and
4.2.2. After generating base learners, rather than trying to use the best individual
learner, ensemble methods combine them with a certain combination method.
There are several popular combination methods, such as averaging, voting, and
stacking [11-13].
Generally speaking, to get a good ensemble, base learners should be as more
accurate as possible and more diverse as possible, which is formally shown
by Krogh and Vedelsby [14], and emphasized and used by many people. The
diversity of base learners can be obtained in different ways, such as sampling
the training data, manipulating the attributes, manipulating the outputs, injecting
randomness into learning process, or even using multiple mechanisms simultane-
ously. For comprehensive introduction of ensemble learning, please refer to [15].
Algorithm The Bagging algorithm for classification
n
i
Input: Data set D ={ ( x i ,y i ) }
1
Base learning algorithm L
The number of iterations T
1: for t = 1to T do
2:
=
h t = L (D, D bs )
/* D bs is the bootstrap distribution */
3: end for
Output: H( x ) = ma y t = 1 I (h t ( x ) = y)
/* I(x) = 1 if x is true, and 0 otherwise */
Search WWH ::




Custom Search