Information Technology Reference
In-Depth Information
Because maximum-margin hyper-plane is defined by weight vector W, it is easy
to recognize that the essence of constructing maximum-margin hyper-plane is to
solve the following constrained optimization problem:
1
|
W
|
2
⊗
≥
∀
Minimize
b
subject to
y
i
(W
T
X
i
+ b)
1
,
2
W
,
This constraints can be re-written as below:
1
2
Minimize
b
|
W
|
subject to 1 -
y
i
(
W
T
⊗
≤
∀
i
X
i
+ b
)
0
,
(4)
2
W
,
1
2
|
W
|
The reason of minimize
is that distance between two parallel hyper-
2
2
W
planes is
and we need to maximize such distance in order to maximize the
|
|
2
W
margin for maximum-margin hyper-plane. Then maximizing
is to
|
|
1
|
W
|
minimize
. Because |W| is the norm of W being complex to compute, we
2
1
1
2
|
W
|
|
W
|
substitute
with
. This is the
constrained optimization problem
or
2
2
quadratic programming
(QP)
optimization
problem. Note that |
W
|
2
can be
computed by scalar product of it and itself:
|
W
|
2
=
W
T
⊗
W
The way to solve QP problem in formula III.3.2 is through its Lagrange dual.
Suppose we want to minimize
f(x)
subject to
g(x)=0
, it exists a solution
x
0
to set of
equations:
∂
⎧
(
f
(
x
)
+
α
g
(
x
))
=
0
⎪
⎨
x
=
x
0
∂
x
⎪
⎩
g
(
x
)
=
0
where
is Lagrange multiplier.
For multi constraints g
i
(x) = 0 (i=
α
1
n
), there is a Lagrange multiplier for each
constraint
⎧
∂
n
(
f
(
x
)
+
∑
α
g
(
x
)
=
0
⎪
⎨
i
i
∂
x
i
=
1
x
=
x
0
⎪
⎩
g
(
x
)
=
0
i