Information Technology Reference
In-Depth Information
μ
i
ʾ
i
=
,
∀
=
,...,
0
i
1
n
(2.9)
(
C
−
ʱ
i
) ʾ
i
=
0
,
∀
i
=
1
,...,
n
(2.10)
ʱ
i
,ʾ
i
,μ
i
≥
0
,
∀
i
=
1
,...,
n
(2.11)
The dual form is obtained as:
n
n
n
1
2
1
ʱ
i
ʱ
j
y
i
y
j
x
i
min
ʱ
x
j
−
1
ʱ
i
i
=
1
i
=
i
=
0
≤
ʱ
i
≤
C
,
∀
i
=
1
,...,
n
(2.12)
n
y
i
ʱ
i
=
0
,
i
=
1
which can be further simplified in matrix form as:
1
2
ʱ
T
Q
1
n
ʱ
min
ʱ
ʱ
−
y
T
0
≤
ʱ
i
≤
C
∀
i
∈ {
1
, ...,
n
}
,
ʱ
=
0
,
(2.13)
where
Q
is the kernel matrix and is a symmetric positive semidefinite
n
×
n
matrix
y
i
y
j
K
x
i
,
x
j
.
where
q
ij
=
is the kernel function which helps to solve linear and non-linear problems.
This is possible through an
implicit
mapping of the inputs known as the
Kernel trick
.
The idea was first introduced in (Aizerman et al.
1964
) and it assumes the existence of
a function
K
(
·
,
·
)
)
which is higher-dimensional (with potentially infinite dimensions) and it is defined as:
ˆ (
x
)
which maps a sample from its original space to a
feature space
(
H
d
ˆ
(
)
: R
ₒ
H
x
(2.14)
, they are more
likely to be linearly separable. Therefore, the use of a linear SVMto solve it appears to
be practical. Moreover, from its formulation, we can also notice that it is not necessary
to explicitly compute the
In a binary classification problem, if samples are mapped into
H
function but rather the inner product between two
elements. This is known as the
Kernel Function
and its formulation is:
ˆ (
x
)
K
(
a
,
b
)
=
ˆ (
a
)
·
ˆ (
b
)
(2.15)
Its only requirement is that it must satisfy the Mercer's theorem (Cristianini and
Shawe-Taylor
2000
) which represents a symmetric positive-definite function on a
square by means of a sum of a convergent sequence of product functions.
Search WWH ::
Custom Search