Digital Signal Processing Reference
In-Depth Information
1
0.5
0
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
x
1
x
1
Fig. 7.4
Solving of an exemplary two-class problem by mapping into higher dimensional space:
While in the one-dimensional (original) space the problem cannot be solved linearly, mapping
by the function
→
(
x
1
,
x
1
)
Φ
:
x
1
allows for error-free separation in the new two-dimensional
space [
1
]
The normal vector
w
then results in
w
=
a
l
y
l
Φ(
x
l
).
(7.23)
l
:
a
l
>
0
The decision function
d
(
x
)
results—applying
Φ
—in:
w,
b
T
d
(
x
)
=
sgn
(w
Φ(
x
)
+
b
).
(7.24)
w,
b
As
T
T
w
Φ(
x
)
=
a
l
y
l
Φ(
x
l
)
Φ(
x
),
(7.25)
l
:
a
l
>
0
Φ
the transformation
is explicitly neither needed for the estimation of the parame-
ters of the classifier, nor for the classification. Instead a so called 'kernel function'
K
Φ
(
x
)
x
,
is being defined, with the condition
K
Φ
(
x
)
=
Φ(
T
x
).
x
,
x
)
Φ(
(7.26)
The kernel function additionally needs to be positively semi-definite, symmetric,
and fulfil the Cauchy-Schwarz inequality. The optimal kernel function for a given
classification or regression problem can only be found empirically. However, recently
so called multi kernels try to overcome the search for optimal kernel functions [
11
].
Most frequently used kernel functions comprise:
•
Polynomial kernel:
K
p
(
x
)
=
(
x
T
x
+
p
x
,
1
)
,
(7.27)
where
p
is the polynomial order,