Database Reference
In-Depth Information
k
:
ˇ
×
ˇ
ₒ
R
(10.21)
x
)
ₒ
x
)
(
x
,
k
(
x
,
(10.22)
which is a function that returns a real number characterizing the similarity between
x
and
x
. A type of similarity measure that is of particular interest is the dot product
which is performed in the feature space,
H
x
)
x
)=(
ˆ
(
x
))
k
(
x
,
:
=(
x
·
x
)
·
ˆ
(
(10.23)
is applied to transfer pattern
x
and
x
into the
where the mapping function
˕
(
x
)
feature space
H
, i.e.,
ˆ
:
ˇ
ₒ
H
,
(10.24)
x
ₒ
x
(10.25)
In order to design the learning algorithm, we must come up with a class of
function hyperplanes,
N
(
w
·
x
)+
b
=
0
w
∈
R
,
b
∈
R
,
(10.26)
corresponding to decision functions,
f
(
x
)=
sgn
((
w
·
x
)+
b
)
(10.27)
We can show that among all hyperplanes separating the data, there exists a unique
one yielding the maximum margin of separation between the classes,
N
{
−
∈
R
,
(
·
)+
=
,
=
,...,
}
max
(
min
x
x
i
;
x
w
x
b
0
i
1
m
(10.28)
w
,
b
)
where
m
is the number of data samples. This is the optimum hyperplane that has
the lowest capacity. This can be constructed by solving a constrained quadratic
optimization problem. The solution vector
w
has an expansion
w
=
∑
i
v
i
x
i
in terms
of a subset of the training patterns, namely those whose
v
i
is non-zero, called
Support Vectors
. These carry all the relevant information about the classification
problem; all remaining examples of the training set are irrelevant. Therefore, we
may rewrite Eq. (
10.27
)as:
sgn
b
∑
f
(
x
)=
i
ʽ
i
(
x
·
x
i
)+
(10.29)
This shows the crucial property of the algorithm that the decision function
depends only on dot products between patterns. We think of the dot product space
as the feature space
H
. To express the formulas in terms of the input patterns lying
Search WWH ::
Custom Search