Information Technology Reference
In-Depth Information
1
2 w
2
C 1 n ʾ
min
w , b , ʾ
+
y i
b
T x i +
w
1
ʾ i ,
0
,
i
∈ {
1
, ...,
n
} ,
(2.2)
i
n is the vector of slack variables associated with each sample and
represent a measure of error (
where
ʾ ∈ R
ʾ i
=
0 when the sample is correctly classified, 0
<
ʾ i
1 when it
is misclassified). C is the hyperparameter which allows to trade-off the contribution
of the margin 1 and the error terms. In this thesis we will use the notation a b to
represent a vector of dimension b of the scalar a . In this way, 1 n ʾ
<
1 when is also correctly classified but lies within the margin, and
ʾ i
can be seen as an
upper bound of the number of errors.
This formulation is called the primal problem. Moreover, it can be reformulated
and solved more easily by using the Lagrange multipliers such as presented in (Oneto
and Greco 2010 ) in order to obtain the dual formulation. The solution starts with the
Lagrangian of the primal formulation (Eq. ( 2.2 )) where two sets of multipliers are
introduced. These are the ones linked to the first and second constrains:
n and
ʱ ∈ R
n
μ ∈ R
respectively. This is presented as:
1 ʱ i y i
b
+ ʾ i
n
n
1
2 w
2
C 1 n ʾ
T x i +
L p ( w ,
b
, ʾ ) =
+
w
1
1 μ i ʾ i ,
(2.3)
i
=
i
=
Following this, with can obtain the Karush-Kuhn-Tucker (KKT) conditions for
the Wolfe dual problem (Karush 1939 ; Kuhn et al. 1951 ). They include the par-
tial derivatives of the Lagrangian (
L p ) with respect to
w
, b , and
ʾ
; and slackness
conditions for
ʱ
and
μ
.
n
∂L p
∂w j =
0
w j
=
1 ʱ i y i x i , j ,
j
=
1
,...,
d
(2.4)
i
=
n
∂L p
b =
0
1 ʱ i y i
=
0
,
(2.5)
i
=
∂L p
∂ʾ i =
0
C
ʱ i μ i
=
0
,
i
=
1
,...,
n
(2.6)
y i
b
T x i +
w
1
ʾ i ,
i
=
1
,...,
n
(2.7)
ʱ i y i
b
+ ʾ i
T x i +
w
1
=
0
,
i
=
1
,...,
n
(2.8)
1 The
 
Search WWH ::




Custom Search