NONLINEAR SEQUENTIAL STATE ESTIMATION FOR SOLVING PATTERN CLASSIFICATION PROBLEMS - Adaptive Signal Processing: Next Generation Solutions

Digital Signal Processing Reference

In-Depth Information

optimized in the dual problem takes the following form

Q ( a ) ¼ X

N

2 X

N

X

N

1

a i a j d i d j x i x j

a i

(6 : 1)

i¼ 1

j¼ 1

subject to the constraints

X

N

a i d i ¼ 0

i¼ 1

a i 0

for i ¼ 1, 2, ... N

where a i in (6.1) are the Lagrange multipliers, and the training data sample is denoted

by { x i , d i } i¼ 1 with x i denoting an input vector, d i denoting the corresponding desired

response, and N denoting the sample size.

Careful examination of (6.1) reveals essential ingredients of SVM learning,

namely, the requirement to solve a quadratic programming problem, which can

become a difficult proposition to execute in computational terms, particularly when

the sample size N is large.

What we have addressed thus far is the optimal solution to a linearly separable

pattern-classification problem. To extend the SVM learning theory to the more difficult

case of nonseparable pattern-classification problems, a set of variables called slack

variables are introduced into the convex optimization problem; these new variables

measure the deviation of a point from the ideal condition of perfect pattern separability.

It turns out that inmathematical terms, the impact of slack variables is merely to modify

the second constraint in the dual optimization problem described in (6.1) as follows

0 a i C for i ¼ 1, 2, ... N

where C is a user-specified positive parameter. Except for this modification, formu-

lation of the dual optimization problem for nonseparable pattern-classification pro-

blems remains intact. Moreover, the support vectors are defined in the same way as

before. The new parameter C controls the tradeoff between the complexity of the

machine and the number of nonseparable patterns; it may therefore be viewed as the

reciprocal of a parameter commonly referred to the regularization parameter. When par-

ameter C is assigned a large value, the implication is that there is high confidence in the

practical quality of the supervisory training data. Conversely, when C is assigned a

small value, the training data sample is considered to be noisy, suggesting that less

emphasis should be placed on its use and more attention be given to the adjustable

parameter vector.

Thus far in the discussion, we have focused attention on an algorithmic formulation

of the SVM. Clearly, we need a network structure for its implementation. In this con-

text, the radial-basis function (RBF) provides a popular choice. Figure 6.3 depicts a

structure of a RBF network, in which the nonlinear hidden layer of size K (smaller

than the sample size N ) consists of Gaussian units; nonlinearity of the hidden layer

is needed to account for nonseparability of the input patterns. Each Gaussian unit is

Adaptive Signal Processing: Next Generation Solutions

Search WWH ::

Custom Search

Home