CLASS IMBALANCE LEARNING METHODS FOR SUPPORT VECTOR MACHINES - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

optimization problem can be reformulated as follows:

min 1

ξ i

2 w

s . t . i (w

(x i )

≥

−

ξ i

(5.3)

ξ i

≥

0 ,i

1 ,...,l

The slack variables ξ i > 0 hold for misclassified examples, and therefore the

penalty term i = 1 ξ i can be considered as a measure of the amount of total

misclassifications (training errors) of the model. This new objective function

given in Equation 5.3 has two goals. One is to maximize the margin and the

other one is to minimize the number of misclassifications (the penalty term).

The parameter C controls the trade-off between these two goals. This quadratic

optimization problem can be easily solved by representing it as a Lagrangian

optimization problem, which has the following dual form:

⎨

⎩

⎬

⎭

max

α i

α i −

α i α j y i y j (x i )

(x j )

(5.4)

s . t .

y i α i = 0 ,

0 ≤ α i ≤ C,

i = 1 ,...,l

where α i are Lagrange multipliers, which should satisfy the following Karush-

Kuhn-Tucker (KKT) conditions:

α i (y i (w · φ(x i ) + b) − 1 + ξ i ) = 0 ,i = 1 ,...,l

(5.5)

(C − α i )ξ i = 0 ,i = 1 ,...,l

(5.6)

An important property of SVMs is that it is not necessary to know the mapping

function φ(x) explicitly. By applying a kernel function, such that K(x i ,x j ) =

φ(x i ) · φ(x j ) , we would be able to transform the dual optimization problem given

in Equation 5.4 into Equation 5.7

⎨

⎩

⎬

⎭

max

α i

α i −

α i α j y i y j K(x i ,x j )

(5.7)

s . t .

y i α i = 0 ,

0 ≤ α i ≤ C,

i = 1 ,...,l

Imbalanced Learning: Foundations, Algorithms, and Applications

Search WWH ::

Custom Search

Home