Information Technology Reference
In-Depth Information
Suppose
λ
*
is a solution of (III.3.1) with condition (III.3.8). In order words,
L
(
) is maximized; so
the weight vector representing maximum-margin hyper-
plane is recovered by
λ
*
λ
*
and
X
i
:
n
*
∑
=
W
=
λ
y
X
(11)
i
i
i
i
1
So the bias b is computed as below:
*
*
T
b
=
y
−
W
⊗
X
(12)
i
i
The rule for classification in (3) becomes:
f
(
X
i
) =
R
= sign(
W
*T
⊗
X
i
+ b*
)
(13)
X
f
y
∈
{-1, 1}
Fig. 3
Classification function
It means that whenever we need to determine to which class a new vector
X
i
belongs, it is only to substitute
X
i
into
W
*T
⊗
X
i
+ b*
and check the value of this
expression. If the value is less than or equal -
1
(
≤
−
1
) then X
i
belongs to class
y
i
= -1
. Otherwise, if the value is greater than or equal
1
(
≥
1
) then
X
i
belongs to
class
y
i
= 1
. Hence the function (
W
*T
⊗
X
i
+ b*
) is called classification function
or classification rule.
The Lagrange multipliers are non-zero when
W
T
⊗
X
i
+ b
is equal
1 or -1
,
vectors
X
i
in this case are considered support vectors they are closest to the
maximum-margin hyper-plane. These vectors lie on parallel hyper-planes. So this
approach is called support vector machine.
Fig. 4
Support vectors