Information Technology Reference
In-Depth Information
•
The features are not measurable.
•
There is no rejection class: all customers have an opinion on the comfort
of a vehicle.
The fact that the features are not measurable precludes the use of a statistical
method. In such a situation, a fuzzy classification method would be more
appropriate.
Handwritten digit recognition, for instance zip code recognition, has been
investigated in detail, and many applications are in routine operation. Con-
sider the answers to the three questions that were asked in the previous two
examples.
•
In sharp contrast with the example of the vending machines, the huge
diversity of handwriting styles makes the choice of features nontrivial, but
feasible; in contrast to the vehicle comfort assessment problem, different
persons who read the same digit will assign it to the same class (except if
the digit is almost illegible).
•
Features are numbers that can be extracted from the picture: in a typical
low-level representation, the features would be the intensities of the pixels;
in a high-level description, the features would be the location of horizontal,
vertical or diagonal segments, the presence and location of loops, etc.
•
The size of the rejection class can be defined, and in some cases, it is a
performance criterion: for a given error rate, the rejection rate should be as
low as possible. In mail processing, a rejected envelope requires a manual
operation, which is less costly than sending a letter to the wrong address.
Hence, the performance requirement is expressed as follows: for a given
error rate (typically 1%) the rejection rate should be as low as possible.
Clearly, it would be easy to design a classifier that never gives a wrong
answer, by simply rejecting all patterns: by contrast, given the economic
constraints of the problem of zip code reading, a “good” classifier makes a
decision as often as possible, while making no more than 1% mistakes. If
economic constraints were different, i.e., if a mistake was less costly than
a human operation, a classifier should have the smallest possible error
rate for a given maximum rejection rate (this is the case for large-scale
automated medical diagnoses, where resorting to a medical doctor is more
costly than delivering a wrong diagnostic).
In the latter example, statistical classification methods such as neural net-
works are perfectly appropriate, provided a suitable database is available. As
in most nonacademic problems, the central question is that of data represen-
tation: a thoughtful representation design, together with data pre-processing
techniques such as described in Chap. 3, is often as important as the classifi-
cation algorithm itself.
Search WWH ::
Custom Search