ENSEMBLE METHODS FOR CLASS IMBALANCE LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

architecture does not need it and can further output pose information in addition

to the face recognition result, and its classification accuracy is higher than that

of the conventional methods supplied with precise pose information. Moreover,

it is not a surprise that in various real-world data-mining competitions, such as

KDD-Cup 1 and Netflix Prize, 2 almost all the top algorithms exploited ensemble

methods in recent years.

In class imbalance learning (CIL), ensemble methods are broadly used to fur-

ther improve the existing methods or help design brand new ones. A famous

example is the ensemble method designed by Viola and Jones [2, 3] for face

detection. Face detection requires to indicate which parts of an image contain a

face in real-time. A typical image has about 50,000 sub-windows to represent

different scales and locations [2], and each one of them should be determined

whether it contains a face or not. Typically, there are only a few dozen faces

among these sub-windows in an image. Furthermore, there are often more non-

face images than images containing any face. Thus, non-face sub-windows could

be 10 4 times more than sub-windows containing any face. Viola and Jones [2, 3]

designed a boosting-based ensemble method to deal with the severe class imbal-

ance problem. This method, together with a cascade-style learning structure, is

able to achieve very high detection rate while keeping very low false positive

rate. This face detector is recognized as one of the breakthroughs in the past

decades. Besides, ensemble methods have been used to improve over-sampling

[4] and under-sampling [5, 6], and a number of boosting-based methods have

been developed to handle class-imbalanced data [2, 4, 7, 8].

We introduce the notations used in this chapter. By default, we talk about

binary classification problems. Let D ={ ( x i ,y i ) }

n

i =

1 be the training set, with

y ∈{− 1 , 1 } . The class with y = 1 is the positive class with n + examples, and

suppose it is the minority class; the class with y =− 1 is the negative class with

n − examples, and suppose it is the majority class. So we have n + <n − ,and

the level of imbalance is r = n − /n + . The subset of training data containing all

the minority class examples is

P

, and the subset containing all the majority class

examples is

N

. Assume that data is independent and identically sampled from

distribution

D

on

X × Y

, where

X

is the input space and

Y

is the output space.

A learning algorithm L trains a classifier h :

X → Y

.

4.2 ENSEMBLE METHODS

The most central concept of machine learning is generalization ability, which

indicates how well the unseen data could be predicted by the learner trained

1 KDD-Cup is the most famous data-mining competition that covers various real-world applications,

such as network intrusion, bioinformatics, and customer relationship management. For further details,

refer to http://www.sigkdd.org/kddcup/

2 Netflix is an online digital video disk (DVD) rental service. Netflix Prize is a data-mining compe-

tition held every year since 2007 to help improve the accuracy of movie recommendation for users.

For further details, refer to http://www.netflixprize.com/

Imbalanced Learning: Foundations, Algorithms, and Applications

Search WWH ::

Custom Search

Home