Discrimination - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

vectors. If the application ϕ has some general (not too restrictive) properties,

those scalar products may be evaluated directly in input space through

Φ ( x )

·

Φ ( y )= K ( x , y ) ,

where K ( x , y ) is a convolution operator or kernel. Therefore, after training,

only the support vectors in input space and the coe cients c k must be stored

in memory. They allow computing the classifier response to any input vector.

Thus, if the kernel K is known, it is unnecessary to keep in memory the weight

vector w , that may have a huge number of components (exactly N +1

N ,since w is a vector in feature space). Moreover, it is not even necessary

to define explicitly the application ϕ : the kernel is su cient. This is why

SVM's are also called kernel machines. One of the most popular kernels is the

Gaussian operator

exp

,

y ) 2

σ 2

( x

−

K ( x , y )

∝

−

generally called RBF (for radial basis function). That kernel corresponds to an

infinite dimension feature space! Such operators were mentioned previously,

in Chap. 1.

As explained at the end of this chapter, the generalization error is a de-

creasing function of the ratio of the dimension of the space where the per-

ceptron performs the separation to the number of training patterns M .In

the case of SVM's, the former is the feature space. Since the number M of

training patterns remains fixed, one can wonder whether SVM's are able to

generalize at all [Buhot et al. 2000]. It has been proved that the generalization

error of the SVM's is bounded by the fraction of training patterns that are

support vectors. Note that only values smaller than 0.5 are of interest, since

a generalization error of 0.5 means that the classifier has probability 0.5 of

misclassification: in other words, it has the same performance as a random

classifier. The interest of this bound is that it can be determined in applica-

tions (it amounts to simply counting the number of support vectors, and to

divide it by M ). That problem, as well as other properties of SVM's, is subject

to active theoretical research (see for example [Risau-Gusman et al. 2000a,b,

2001; Dietrich et al. 1999; Risau-Gusman et al. 2002]). The interested reader

may read the recent Ph.D. thesis of [Risau-Gusman 2001].

6.6 Problems with More than two Classes

A straightforward way of discriminating patterns among more than two classes

is to separate each class from all the others. A problem with K classes

y 1 ,y 2 ,...,y K is thus reduced to K problems of binary discrimination, with

each classifier dedicated to recognize one and only one class. However, it may

happen that more than one classifier recognizes the same input pattern. In

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home