Information Technology Reference
In-Depth Information
Fig. 1.32. Some excerpts from the NIST handwritten digit database
one decompose a zip code into five separate digits? Having solved that di cult
problem, one must cope with the variety of styles, sizes, and orientations, of
the isolated digits. To that end, an appropriate representation must be found;
the design of an appropriate representation is completely problem-dependent,
and requires new efforts for each new application. Clearly, one cannot use
the same kind of representations for pictures such as handwritten or printed
digits, satellite pictures, or X-ray medical images.
Despite the diversity of image processing techniques, some basic operations
are found in essentially all applications, as well as in the human visual system:
edge detection, contrast enhancement, etc. In the field of handwritten charac-
ter recognition, normalization is mandatory, in order to apply the recognition
algorithm to characters of similar sizes. We have already mentioned that the
design of a real application requires a tradeoff between the complexity of the
preprocessing that is necessary to yield the chosen representation, and the
complexity of classification: a carefully designed preprocessing, which leads to
very discriminant features, may allow the use of a very simple classifier, but
the preprocessing must not be too demanding in terms of computation time.
By contrast, a simple preprocessing, such as normalization alone, may be very
fast but will not alleviate the task of the classifier. Thus, one must engineer
the best tradeoff that allows meeting the requirements of the application. Two
very different approaches to the same application will be described below.
The first approach was developed at the former AT&T Bell Labs. It con-
sists of a neural network, known as LeNet [Le Cun 1991], which makes use
of a pixel representation (after normalization). The first layers of the network
perform local preprocessings that aim at automatically extracting features
that are relevant to classification, while the final layers perform classification.
The network is shown on Fig. 1.33.
The network is input with a 16
16 pixel intensity matrix. A first layer
of hidden neurons is made of 12 sets of 64 neurons each, where each set of
×
Search WWH ::




Custom Search