Digital Signal Processing Reference
In-Depth Information
optimized in the dual problem takes the following form
Q ( a ) ¼ X
N
2 X
N
X
N
1
a i a j d i d j x i x j
a i
(6 : 1)
1
1
1
subject to the constraints
X
N
a i d i ¼ 0
1
a i 0
for i ¼ 1, 2, ... N
where a i in (6.1) are the Lagrange multipliers, and the training data sample is denoted
by { x i , d i } 1 with x i denoting an input vector, d i denoting the corresponding desired
response, and N denoting the sample size.
Careful examination of (6.1) reveals essential ingredients of SVM learning,
namely, the requirement to solve a quadratic programming problem, which can
become a difficult proposition to execute in computational terms, particularly when
the sample size N is large.
What we have addressed thus far is the optimal solution to a linearly separable
pattern-classification problem. To extend the SVM learning theory to the more difficult
case of nonseparable pattern-classification problems, a set of variables called slack
variables are introduced into the convex optimization problem; these new variables
measure the deviation of a point from the ideal condition of perfect pattern separability.
It turns out that inmathematical terms, the impact of slack variables is merely to modify
the second constraint in the dual optimization problem described in (6.1) as follows
0 a i C for i ¼ 1, 2, ... N
where C is a user-specified positive parameter. Except for this modification, formu-
lation of the dual optimization problem for nonseparable pattern-classification pro-
blems remains intact. Moreover, the support vectors are defined in the same way as
before. The new parameter C controls the tradeoff between the complexity of the
machine and the number of nonseparable patterns; it may therefore be viewed as the
reciprocal of a parameter commonly referred to the regularization parameter. When par-
ameter C is assigned a large value, the implication is that there is high confidence in the
practical quality of the supervisory training data. Conversely, when C is assigned a
small value, the training data sample is considered to be noisy, suggesting that less
emphasis should be placed on its use and more attention be given to the adjustable
parameter vector.
Thus far in the discussion, we have focused attention on an algorithmic formulation
of the SVM. Clearly, we need a network structure for its implementation. In this con-
text, the radial-basis function (RBF) provides a popular choice. Figure 6.3 depicts a
structure of a RBF network, in which the nonlinear hidden layer of size K (smaller
than the sample size N ) consists of Gaussian units; nonlinearity of the hidden layer
is needed to account for nonseparability of the input patterns. Each Gaussian unit is
 
Search WWH ::




Custom Search