Discrimination - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

Remark 1. If the training set is not linearly separable in the feature space

determined by function Φ ( x ), the algorithm does not converge, and does not

give even an approximation to the solution.

Remark 2. The same algorithm of constrained quadratic minimization may

be used to find the maximal margin perceptron in input space, if the training

set is known to be linearly separable. Otherwise, one has to map the patterns

to a feature space of dimension high enough to guarantee linear separation.

If the hard margin constraints cannot be satisfied, two possibilities arise:

either another application ϕ is chosen, or the constraint that all examples

should be correctly classified and lie outside the margin are relaxed. In the

latter case, new variables ζ k , called slack variables, are introduced. Then the

function to be minimized is

w + C M

Γ ( C )= 1

( ζ k ) n ,

2 w

·

k =1

where C is a positive hyperparameter ( C> 0) to be chosen, and n a positive

exponent ( n> 0), under the constraints

y k Φ ( x k )

ζ k

·

w > 1

−

ζ k ≥

0 .

The minimization of Γ ( C ) under the above two constraints defines the soft

margin SVM. Clearly, the slack variables allow the violation of the hard margin

constraints. If 0 <ζ k < 1, the distance of the examples to the hyperplane

is smaller than 1 /

, but they are correctly classified. Conversely, those

with ζ k > 1 are incorrectly classified. In order to minimize the number of

misclassifications, they must be penalized: that is the rationale for including

the second term in the cost function Γ ( C ). The larger the exponent n ,the

more penalized the misclassifications. However, in order to keep a quadratic

minimization problem, the possible values of n are restricted to n =1or

n = 2. In such conditions, the solution of the soft margin problem is still

unique, and can be found using algorithms of quadratic minimization under

constraints. The solution has the same form as the above expression of w , with

the examples with ζ k

w

= 0 included in the sum. In principle, the coe cient C

in Γ ( C ) is arbitrary. Its value controls the relative importance given to the

training errors with respect to the margin maximization. Theoretical studies

have shown that the value of C has a large influence in the generalization

properties of the soft margin SVM's. The choice of its value is a practical

problem, and several heuristics have been proposed for the applications of the

algorithm.

In practice there is no need to define explicitly the application ϕ : x

→

Φ ( x ). In fact, from the expression of w , classifying a new input x only requires

the scalar products Φ ( x )

Φ ( x k ) of the input features with those of the support

·

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home