Information Technology Reference
In-Depth Information
The quantity that we would like to minimize is the generalization error
ε g ( w ), defined by
ε g ( w )=
y = ± 1
Θ ( −yσ ( x ; w )) p x, y ( x ,y )d x
where σ is the class assigned by the classifier to the input x . The generaliza-
tion error is thus the probability that the classifier with parameters w makes
a classification error on an input x drawn with probability p X ( x ), whose class
y has probability P Y ( y
x ). Clearly, the generalization error cannot be com-
puted in actual applications because p X ( x )and P Y ( y
|
x ) are unknown. In
practice, ε g is estimated by statistical methods such as cross-validation, as
discussed in Chap. 2. Later in the present chapter, we will come back to that
probabilistic formulation, because it is one of the foundations of the statistical
learning theory. It allows the determination of bounds to training and gen-
eralization errors, or the estimation of their typical values. Clearly, training
from examples raises the following fundamental questions:
|
1. What are the properties of the classifier designed through learning, and
more specifically, what is its generalization error?
2. What is the minimal number of examples needed to catch the regularities
in the data?
3. What are the properties of different training algorithms?
4. Given a training set, are the classifier parameters w unique? If multiple
solutions are possible, is there an optimal one?
6.1.2 Discriminant Surfaces
R N (the assumption of real-valued
components is not essential: the results presented in this chapter are also
valid for discrete-valued components, unless explicitly stated). We can repre-
sent them as colored points in an N dimensional space, each color indicating
the class of the corresponding point. The surface that separates the points
of different class is termed discriminant surface. As shown on Fig. 6.1, that
surface is not necessarily unique, and can possibly be a combination of parts
of surfaces. Training aims at determining the equation of an appropriate dis-
criminant surface.
As indicated in Chap. 1, classification may be considered as a particular
case of regression, where we search for a continuous surface g ( x ) whose values
are close to the desired output, i.e., a function that is equal to +1 for the
examples x k of class y k =1,andto
Assume that the inputs are vectors x
1 for those of class y k =
1, as shown
on Fig. 6.2. The techniques presented in Chap. 2 can be used to find this
function. The discriminant surface is the set of points x where the sign of
g ( x ) changes.
Two situations may arise in an application:
Search WWH ::




Custom Search