NONLINEAR SEQUENTIAL STATE ESTIMATION FOR SOLVING PATTERN CLASSIFICATION PROBLEMS - Adaptive Signal Processing: Next Generation Solutions

Digital Signal Processing Reference

In-Depth Information

6.2.2 Support Vector Machine

The support vector machine (SVM) is basically a binary classification machine

with some highly elegant properties. It was pioneered by Vapnik and the first

description was presented in [3]. Detailed descriptions of the machine are presented

in [1, 2].

Unlike the BP algorithm, supervised training of the SVM looks to optimization

theory for its derivation. To develop insights into SVM theory, formulation of the

learning algorithm begins simply with the case of a simple pair of linearly separable

patterns. In such a setting, the objective is to maximize the margin of separation

between the two classes, given a set of training data made up of input vector-desired

response pairs. Appealing to statistical learning theory, we find that the maximization

of the margin of separation is equivalent to minimizing the Euclidean norm of

the adjustable parameter vector. Building on this important finding, the optimization

design of a linearly separable pattern-classifier is formulated as a constrained convex

optimization problem, expressed in words as: Minimize sequentially the adjustable

parameter vector of a linearly separable pattern-classifier, subject to the constraint

that the parameter vector satisfies the supervisory training data. Hence, in mathe-

matical terms, the objective function is expressed as a Lagrange function where

the Lagrange multipliers constitute a set of auxiliary nonnegative variables equal

in number to the training sample size. There are three important points that emerge

from differentiating the Lagrangian function with respect to the adjustable param-

eters, including a bias term.

1. The Lagrange multipliers are divided into two sets: one zero, and the other

nonzero.

2. The equality constraints are satisfied only by those Lagrange multipliers that are

non-zero in accordance with the Karush-Kuhn-Turker (KKT) conditions of

convex optimization theory.

3. The supervisory data points for which the KKT conditions are satisfied are

called support vectors, which, in effect, define those data points that are the

most difficult to classify. In light of this third point, the training data set is

said to be sparse in the sense that only a fraction of them define the boundaries

of the margin of separation.

Given such a constrained convex optimization problem, it is possible to construct

another problem called the dual problem. Naturally, this second problem has the

same optimal value as the primal problem, that is, the original optimization problem,

but with an important difference: The optimal solution to the primal problem is

expressed in terms of free network parameters, whereas the Lagrange multipliers pro-

vide the optimal solution to the dual problem. This statement is of profound practical

importance as it teaches us that the decision boundary for the linearly separable

pattern-classification problem can be computed without having to find the optimal

value of the parameter vector. To be more specific, the objective function to be

Search WWH ::

Custom Search

Home