Information Technology Reference
In-Depth Information
μ i ʾ i
=
,
=
,...,
0
i
1
n
(2.9)
(
C
ʱ i ) ʾ i
=
0
,
i
=
1
,...,
n
(2.10)
ʱ i i i
0
,
i
=
1
,...,
n
(2.11)
The dual form is obtained as:
n
n
n
1
2
1 ʱ i ʱ j y i y j x i
min
ʱ
x j
1 ʱ i
i
=
1
i
=
i
=
0
ʱ i
C
,
i
=
1
,...,
n
(2.12)
n
y i ʱ
i
=
0
,
i
=
1
which can be further simplified in matrix form as:
1
2 ʱ
T Q
1 n ʱ
min
ʱ
ʱ
y T
0
ʱ i
C
i
∈ {
1
, ...,
n
} ,
ʱ =
0
,
(2.13)
where Q is the kernel matrix and is a symmetric positive semidefinite n
×
n matrix
y i y j K x i ,
x j .
where q ij =
is the kernel function which helps to solve linear and non-linear problems.
This is possible through an implicit mapping of the inputs known as the Kernel trick .
The idea was first introduced in (Aizerman et al. 1964 ) and it assumes the existence of
a function
K
( · , · )
)
which is higher-dimensional (with potentially infinite dimensions) and it is defined as:
ˆ (
x
)
which maps a sample from its original space to a feature space (
H
d
ˆ (
) : R
H
x
(2.14)
, they are more
likely to be linearly separable. Therefore, the use of a linear SVMto solve it appears to
be practical. Moreover, from its formulation, we can also notice that it is not necessary
to explicitly compute the
In a binary classification problem, if samples are mapped into
H
function but rather the inner product between two
elements. This is known as the Kernel Function and its formulation is:
ˆ (
x
)
K
(
a
,
b
) = ˆ (
a
) · ˆ (
b
)
(2.15)
Its only requirement is that it must satisfy the Mercer's theorem (Cristianini and
Shawe-Taylor 2000 ) which represents a symmetric positive-definite function on a
square by means of a sum of a convergent sequence of product functions.
Search WWH ::




Custom Search