Information Technology Reference
In-Depth Information
The features are not measurable.
There is no rejection class: all customers have an opinion on the comfort
of a vehicle.
The fact that the features are not measurable precludes the use of a statistical
method. In such a situation, a fuzzy classification method would be more
appropriate.
Handwritten digit recognition, for instance zip code recognition, has been
investigated in detail, and many applications are in routine operation. Con-
sider the answers to the three questions that were asked in the previous two
examples.
In sharp contrast with the example of the vending machines, the huge
diversity of handwriting styles makes the choice of features nontrivial, but
feasible; in contrast to the vehicle comfort assessment problem, different
persons who read the same digit will assign it to the same class (except if
the digit is almost illegible).
Features are numbers that can be extracted from the picture: in a typical
low-level representation, the features would be the intensities of the pixels;
in a high-level description, the features would be the location of horizontal,
vertical or diagonal segments, the presence and location of loops, etc.
The size of the rejection class can be defined, and in some cases, it is a
performance criterion: for a given error rate, the rejection rate should be as
low as possible. In mail processing, a rejected envelope requires a manual
operation, which is less costly than sending a letter to the wrong address.
Hence, the performance requirement is expressed as follows: for a given
error rate (typically 1%) the rejection rate should be as low as possible.
Clearly, it would be easy to design a classifier that never gives a wrong
answer, by simply rejecting all patterns: by contrast, given the economic
constraints of the problem of zip code reading, a “good” classifier makes a
decision as often as possible, while making no more than 1% mistakes. If
economic constraints were different, i.e., if a mistake was less costly than
a human operation, a classifier should have the smallest possible error
rate for a given maximum rejection rate (this is the case for large-scale
automated medical diagnoses, where resorting to a medical doctor is more
costly than delivering a wrong diagnostic).
In the latter example, statistical classification methods such as neural net-
works are perfectly appropriate, provided a suitable database is available. As
in most nonacademic problems, the central question is that of data represen-
tation: a thoughtful representation design, together with data pre-processing
techniques such as described in Chap. 3, is often as important as the classifi-
cation algorithm itself.
Search WWH ::




Custom Search