Information Technology Reference
In-Depth Information
vectors. If the application ϕ has some general (not too restrictive) properties,
those scalar products may be evaluated directly in input space through
Φ ( x )
·
Φ ( y )= K ( x , y ) ,
where K ( x , y ) is a convolution operator or kernel. Therefore, after training,
only the support vectors in input space and the coe cients c k must be stored
in memory. They allow computing the classifier response to any input vector.
Thus, if the kernel K is known, it is unnecessary to keep in memory the weight
vector w , that may have a huge number of components (exactly N +1
N ,since w is a vector in feature space). Moreover, it is not even necessary
to define explicitly the application ϕ : the kernel is su cient. This is why
SVM's are also called kernel machines. One of the most popular kernels is the
Gaussian operator
exp
,
y ) 2
σ 2
( x
K ( x , y )
generally called RBF (for radial basis function). That kernel corresponds to an
infinite dimension feature space! Such operators were mentioned previously,
in Chap. 1.
As explained at the end of this chapter, the generalization error is a de-
creasing function of the ratio of the dimension of the space where the per-
ceptron performs the separation to the number of training patterns M .In
the case of SVM's, the former is the feature space. Since the number M of
training patterns remains fixed, one can wonder whether SVM's are able to
generalize at all [Buhot et al. 2000]. It has been proved that the generalization
error of the SVM's is bounded by the fraction of training patterns that are
support vectors. Note that only values smaller than 0.5 are of interest, since
a generalization error of 0.5 means that the classifier has probability 0.5 of
misclassification: in other words, it has the same performance as a random
classifier. The interest of this bound is that it can be determined in applica-
tions (it amounts to simply counting the number of support vectors, and to
divide it by M ). That problem, as well as other properties of SVM's, is subject
to active theoretical research (see for example [Risau-Gusman et al. 2000a,b,
2001; Dietrich et al. 1999; Risau-Gusman et al. 2002]). The interested reader
may read the recent Ph.D. thesis of [Risau-Gusman 2001].
6.6 Problems with More than two Classes
A straightforward way of discriminating patterns among more than two classes
is to separate each class from all the others. A problem with K classes
y 1 ,y 2 ,...,y K is thus reduced to K problems of binary discrimination, with
each classifier dedicated to recognize one and only one class. However, it may
happen that more than one classifier recognizes the same input pattern. In
Search WWH ::




Custom Search