Graphics Reference
In-Depth Information
are penalized in the objective function via the regularization parameter
C
chosen a
priori.
In the
is not defined a priori but is itself a variable. Its value
is traded off against model complexity and slack variables via a constant
ν
-SVM the size of
ε
ν
∈
(
0
,
1
]
minimize
l
1
2
1
l
,ξ
(
∗
)
,ε)
=
2
1
(ξ
i
+
ξ
i
))
τ(
W
W
+
C
•
(νε
+
(4.41)
i
=
subject to the constraints
4.38
-
4.40
. Using Lagrange multipliers techniques, one can
show [
17
] that the minimization of Eq. (
4.37
) under the constraints
4.38
-
4.40
results
in a convex optimization problem with a global minimum. The same is true for the
optimization problem
4.41
under the constraints
4.38
-
4.40
. At the optimum, the
regression estimate can be shown to take the form
l
i
=
1
(α
i
f
(
x
)
=
−
α
i
)(
x
i
•
x
)
+
b
(4.42)
(α
i
In most cases, only a subset of the coefficients
will be nonzero. The
corresponding examples
x
i
are termed support vectors (SVs). The coefficients and
the SVs, as well as the offset
b
; are computed by the
−
α
i
)
-SVM algorithm. In order to
move from linear (as in Eq.
4.42
) to nonlinear functions the following generalization
can be done: we map the input vectors
x
i
into a high-dimensional feature space
Z
through some chosen a priori nonlinear mapping
ν
Φ
:
→
Z
i
. We then solve the
optimization problem
4.41
in the feature space
Z
. In this case, the inner product
of the input vectors
X
i
(
x
i
•
x
)
in Eq. (
4.42
) is replaced by the inner product of their
icons in feature space
Z
. The calculation of the inner product in
a high-dimensional space is computationally very expensive. Nevertheless, under
general conditions (see [
17
] and references therein) these expensive calculations can
be reduced significantly by using a suitable function
k
such that
,(Φ(
x
i
)
•
Φ(
x
))
(Φ(
x
i
)
•
Φ(
x
))
=
k
(
x
i
•
x
),
(4.43)
leading to nonlinear regressions functions of the form:
l
i
=
1
(α
i
f
(
x
)
=
−
α
i
)
k
(
x
i
,
x
)
+
b
(4.44)
The nonlinear function
k
is called a kernel [
17
]. We mostly use a Gaussian kernel
2
2
k
(
x
,
y
)
exp
(
−
x
−
y
/(
2
σ
kernel
))
(4.45)