Information Technology Reference
In-Depth Information
Although SVMs often work effectively with balanced datasets, they could
produce suboptimal results with imbalanced datasets. More specifically, an SVM
classifier trained on an imbalanced dataset often produces models that are biased
toward the majority class and have low performance on the minority class. There
have been various data preprocessing and algorithmic techniques proposed to
overcome this problem for SVMs. This chapter is dedicated to discuss these
techniques. In Section 5.2, we present some background on the SVM learning
algorithm. In Section 5.3, we discuss why SVMs are sensitive to the imbalance
in datasets. Sections 5.4 and 5.5 present the existing techniques proposed in the
literature to handle the class imbalance problem for SVMs. Finally, Section 5.6
summarizes this chapter.
5.2
INTRODUCTION TO SUPPORT VECTOR MACHINES
In this section, we briefly review the learning algorithm of SVMs, which has been
initially proposed in [1-3]. Let us consider that we have a binary classification
problem represented by a dataset { (x 1 ,y 1 ), (x 2 ,y 2 ), . . . , (x l ,y l ) } , where x i
n
represents an n -dimensional data point, and y i ∈{− 1 , 1 } represents the label of
the class of that data point, for i = 1 ,...,l . The goal of the SVM learning algo-
rithm is to find the optimal separating hyperplane that effectively separates these
data points into two classes. In order to find a better separation of the classes, the
data points are first considered to be transformed into a higher dimensional fea-
ture space by a nonlinear mapping function . A possible separating hyperplane
residing in this transformed higher dimensional feature space can be represented
by,
w · (x) + b = 0
(5.1)
where w is the weight vector normal to the hyperplane. If the dataset is completely
linearly separable, the separating hyperplane with the maximum margin (for a
higher generalization capability) can be found by solving the following maximal
margin optimization problem:
min 1
w
2 w
·
s . t . i (w
·
(x i )
+
b)
1
(5.2)
i
=
1 ,...,l
However, in most real-world problems, the datasets are not completely linearly
separable even though they are mapped into a higher dimensional feature space.
Therefore, the constrains in the optimization problem mentioned in Equation 5.2
are relaxed by introducing a set of slack variables, ξ i 0. Then, the soft margin
Search WWH ::




Custom Search