CLASS IMBALANCE LEARNING METHODS FOR SUPPORT VECTOR MACHINES - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

Although SVMs often work effectively with balanced datasets, they could

produce suboptimal results with imbalanced datasets. More specifically, an SVM

classifier trained on an imbalanced dataset often produces models that are biased

toward the majority class and have low performance on the minority class. There

have been various data preprocessing and algorithmic techniques proposed to

overcome this problem for SVMs. This chapter is dedicated to discuss these

techniques. In Section 5.2, we present some background on the SVM learning

algorithm. In Section 5.3, we discuss why SVMs are sensitive to the imbalance

in datasets. Sections 5.4 and 5.5 present the existing techniques proposed in the

literature to handle the class imbalance problem for SVMs. Finally, Section 5.6

summarizes this chapter.

5.2

INTRODUCTION TO SUPPORT VECTOR MACHINES

In this section, we briefly review the learning algorithm of SVMs, which has been

initially proposed in [1-3]. Let us consider that we have a binary classification

problem represented by a dataset { (x 1 ,y 1 ), (x 2 ,y 2 ), . . . , (x l ,y l ) } , where x i ∈

n

represents an n -dimensional data point, and y i ∈{− 1 , 1 } represents the label of

the class of that data point, for i = 1 ,...,l . The goal of the SVM learning algo-

rithm is to find the optimal separating hyperplane that effectively separates these

data points into two classes. In order to find a better separation of the classes, the

data points are first considered to be transformed into a higher dimensional fea-

ture space by a nonlinear mapping function . A possible separating hyperplane

residing in this transformed higher dimensional feature space can be represented

by,

w · (x) + b = 0

(5.1)

where w is the weight vector normal to the hyperplane. If the dataset is completely

linearly separable, the separating hyperplane with the maximum margin (for a

higher generalization capability) can be found by solving the following maximal

margin optimization problem:

min 1

w

2 w

·

s . t . i (w

·

(x i )

+

b)

≥

1

(5.2)

i

=

1 ,...,l

However, in most real-world problems, the datasets are not completely linearly

separable even though they are mapped into a higher dimensional feature space.

Therefore, the constrains in the optimization problem mentioned in Equation 5.2

are relaxed by introducing a set of slack variables, ξ i ≥ 0. Then, the soft margin

Imbalanced Learning: Foundations, Algorithms, and Applications

Search WWH ::

Custom Search

Home