ENSEMBLE METHODS FOR CLASS IMBALANCE LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

from the training data. Many efforts have been devoted to design methods that

can generate learners with strong generalization ability. Ensemble learning is

one of the most successful paradigms. Different from ordinary machine learning

methods (which usually generate one single learner), ensemble methods train a

set of base learners from training data to make predictions with each one of

them, and then combine these predictions to give the final decision.

The most amazing part of the ensemble is that it can boost learners with

slightly better performance than random guess into learners with strong gener-

alization ability. Thus, the “base learners” are often referred as weak learners .

This also indicates that in ensemble methods, the “base learners” can have weak

generalization ability. Actually, most learning algorithms, such as decision trees,

NN, or other machine learning methods, can be invoked to train “base learners,”

and ensemble methods can boost the performance.

According to how the base learners are generated, ensemble methods can be

roughly categorized into two paradigms: parallel ensemble methods and sequen-

tial ensemble methods. Parallel ensemble methods generate base learners in

parallel, with Bagging [9] as a representative. Sequential ensemble methods

generate base learners in sequential, where a former base learner has influence

on the generation of subsequent learners, with AdaBoost [10] as a represen-

tative. We will briefly introduce Bagging and AdaBoost in Sections 4.2.1 and

4.2.2. After generating base learners, rather than trying to use the best individual

learner, ensemble methods combine them with a certain combination method.

There are several popular combination methods, such as averaging, voting, and

stacking [11-13].

Generally speaking, to get a good ensemble, base learners should be as more

accurate as possible and more diverse as possible, which is formally shown

by Krogh and Vedelsby [14], and emphasized and used by many people. The

diversity of base learners can be obtained in different ways, such as sampling

the training data, manipulating the attributes, manipulating the outputs, injecting

randomness into learning process, or even using multiple mechanisms simultane-

ously. For comprehensive introduction of ensemble learning, please refer to [15].

Algorithm The Bagging algorithm for classification

n

i

Input: Data set D ={ ( x i ,y i ) }

1

Base learning algorithm L

The number of iterations T

1: for t = 1to T do

2:

=

h t = L (D, D bs )

/* D bs is the bootstrap distribution */

3: end for

Output: H( x ) = ma y t = 1 I (h t ( x ) = y)

/* I(x) = 1 if x is true, and 0 otherwise */

Imbalanced Learning: Foundations, Algorithms, and Applications

Search WWH ::

Custom Search

Home