Information Technology Reference
In-Depth Information
in multi-class problems that are claimed to be di cult, the examples turn out
to be pairwise linearly separable. In such cases, powerful, elegant algorithms
give excellent solutions, as explained in Chap. 6: therefore, the first step in
the design of a classifier consists, in checking whether the classes are pairwise
linearly separable. Ho and Kashyap's algorithm [Ho 1965], which was discov-
ered long before neural networks came into existence, provides an answer to
that problem in finite time:
If the examples are linearly separable, the algorithm finds a separating
hyperplane in finite time.
If the examples are not linearly separable, the algorithm signals it infinite
time (see algorithmic complements at the end of this chapter).
The postal code database provided by the National Institute of Standards
and Technology has served as a basis for many classifier designs. It turns
out that, even with low-level representations such as a pixel representation,
the classes of examples are pairwise linearly separable [Knerr 1992]! Simi-
larly, a famous sonar signal data base has been investigated in great detail by
many authors, and many complicated classifiers were designed to solve that
two-class problem; actually, in less than ten minutes on a PC, the Ho and
Kashyap algorithm, implemented as an uncompiled Matlab program, proves
that the examples of the two classes are linearly separable. Therefore, a sim-
ple Perceptron can solve the problem, without resorting to any hidden layer.
That application is investigated in more detail in Chap. 6.
1.3.5.3 Classifier Design Methodology
From the previous section, the following strategy for the design of a neural
classifier can be derived (as discussed above, one should first ascertain that
statistical classification is relevant to the problem at hand):
Find an appropriate representation of the patterns to be classified, espe-
cially for pattern recognition (the techniques described in Chap. 3 can be
especially useful in that respect); this is a crucial step, since a representa-
tion that has a high discriminative power is likely to make the classification
problem trivial; this is illustrated in applications described below.
If the number of examples available for training the classifier is not larger
than the dimension of the feature vector, there is no point in pursuing the
design any further, according to Cover's theorem [Cover 1965], which is
explained in Chap. 6: before proceeding to the next steps, either a more
“compact” representation must be found, or additional examples must be
collected, or a very stringent regularization method such as weight decay
(described in Chap. 2), must be implemented.
For each pair of classes, select the relevant features with the feature selec-
tion methods described in Chap. 2; obviously, the discrimination of class
A from class B may not require the same features as the discrimination of
class A from class C .
Search WWH ::




Custom Search