Digital Signal Processing Reference
In-Depth Information
6.2.2 Support Vector Machine
The support vector machine (SVM) is basically a binary classification machine
with some highly elegant properties. It was pioneered by Vapnik and the first
description was presented in [3]. Detailed descriptions of the machine are presented
in [1, 2].
Unlike the BP algorithm, supervised training of the SVM looks to optimization
theory for its derivation. To develop insights into SVM theory, formulation of the
learning algorithm begins simply with the case of a simple pair of linearly separable
patterns. In such a setting, the objective is to maximize the margin of separation
between the two classes, given a set of training data made up of input vector-desired
response pairs. Appealing to statistical learning theory, we find that the maximization
of the margin of separation is equivalent to minimizing the Euclidean norm of
the adjustable parameter vector. Building on this important finding, the optimization
design of a linearly separable pattern-classifier is formulated as a constrained convex
optimization problem, expressed in words as: Minimize sequentially the adjustable
parameter vector of a linearly separable pattern-classifier, subject to the constraint
that the parameter vector satisfies the supervisory training data. Hence, in mathe-
matical terms, the objective function is expressed as a Lagrange function where
the Lagrange multipliers constitute a set of auxiliary nonnegative variables equal
in number to the training sample size. There are three important points that emerge
from differentiating the Lagrangian function with respect to the adjustable param-
eters, including a bias term.
1. The Lagrange multipliers are divided into two sets: one zero, and the other
nonzero.
2. The equality constraints are satisfied only by those Lagrange multipliers that are
non-zero in accordance with the Karush-Kuhn-Turker (KKT) conditions of
convex optimization theory.
3. The supervisory data points for which the KKT conditions are satisfied are
called support vectors, which, in effect, define those data points that are the
most difficult to classify. In light of this third point, the training data set is
said to be sparse in the sense that only a fraction of them define the boundaries
of the margin of separation.
Given such a constrained convex optimization problem, it is possible to construct
another problem called the dual problem. Naturally, this second problem has the
same optimal value as the primal problem, that is, the original optimization problem,
but with an important difference: The optimal solution to the primal problem is
expressed in terms of free network parameters, whereas the Lagrange multipliers pro-
vide the optimal solution to the dual problem. This statement is of profound practical
importance as it teaches us that the decision boundary for the linearly separable
pattern-classification problem can be computed without having to find the optimal
value of the parameter vector. To be more specific, the objective function to be
 
Search WWH ::




Custom Search