Information Technology Reference
In-Depth Information
optimization problem can be reformulated as follows:
min 1
ξ i
l
2 w
·
w
+
C
i
=
1
s . t . i (w
·
(x i )
+
b)
1
ξ i
(5.3)
ξ i
0 ,i
=
1 ,...,l
The slack variables ξ i > 0 hold for misclassified examples, and therefore the
penalty term i = 1 ξ i can be considered as a measure of the amount of total
misclassifications (training errors) of the model. This new objective function
given in Equation 5.3 has two goals. One is to maximize the margin and the
other one is to minimize the number of misclassifications (the penalty term).
The parameter C controls the trade-off between these two goals. This quadratic
optimization problem can be easily solved by representing it as a Lagrangian
optimization problem, which has the following dual form:
l
l
l
1
2
max
α i
α i
α i α j y i y j (x i )
·
(x j )
(5.4)
i
=
1
i
=
1
j
=
1
l
s . t .
y i α i = 0 ,
0 α i C,
i = 1 ,...,l
i
=
1
where α i are Lagrange multipliers, which should satisfy the following Karush-
Kuhn-Tucker (KKT) conditions:
α i (y i (w · φ(x i ) + b) 1 + ξ i ) = 0 ,i = 1 ,...,l
(5.5)
(C α i i = 0 ,i = 1 ,...,l
(5.6)
An important property of SVMs is that it is not necessary to know the mapping
function φ(x) explicitly. By applying a kernel function, such that K(x i ,x j ) =
φ(x i ) · φ(x j ) , we would be able to transform the dual optimization problem given
in Equation 5.4 into Equation 5.7
l
l
l
1
2
max
α i
α i
α i α j y i y j K(x i ,x j )
(5.7)
i
=
1
i
=
1
j
=
1
l
s . t .
y i α i = 0 ,
0 α i C,
i = 1 ,...,l
i
=
1
Search WWH ::




Custom Search