Database Reference
In-Depth Information
k :
ˇ × ˇ R
(10.21)
x )
x )
(
x
,
k
(
x
,
(10.22)
which is a function that returns a real number characterizing the similarity between
x and x . A type of similarity measure that is of particular interest is the dot product
which is performed in the feature space, H
x )
x )=( ˆ (
x ))
k
(
x
,
:
=(
x
·
x
) · ˆ (
(10.23)
is applied to transfer pattern x and x into the
where the mapping function
˕ (
x
)
feature space H , i.e.,
ˆ
:
ˇ
H
,
(10.24)
x
x
(10.25)
In order to design the learning algorithm, we must come up with a class of
function hyperplanes,
N
(
w
·
x
)+
b
=
0
w
R
,
b
R ,
(10.26)
corresponding to decision functions,
f
(
x
)=
sgn
((
w
·
x
)+
b
)
(10.27)
We can show that among all hyperplanes separating the data, there exists a unique
one yielding the maximum margin of separation between the classes,
N
{
R
, (
·
)+
=
,
=
,...,
}
max
(
min
x
x i
; x
w
x
b
0
i
1
m
(10.28)
w
,
b
)
where m is the number of data samples. This is the optimum hyperplane that has
the lowest capacity. This can be constructed by solving a constrained quadratic
optimization problem. The solution vector w has an expansion w
= i v i x i in terms
of a subset of the training patterns, namely those whose v i is non-zero, called
Support Vectors . These carry all the relevant information about the classification
problem; all remaining examples of the training set are irrelevant. Therefore, we
may rewrite Eq. ( 10.27 )as:
sgn
b
f
(
x
)=
i ʽ i (
x
·
x i )+
(10.29)
This shows the crucial property of the algorithm that the decision function
depends only on dot products between patterns. We think of the dot product space
as the feature space H . To express the formulas in terms of the input patterns lying
Search WWH ::




Custom Search