Neural Networks for Handwriting Recognition - Computational Intelligence Paradigms in Advanced Pattern Classification

Information Technology Reference

In-Depth Information

A more satisfactory solution to Sayre's paradox would be to segment and rec-

ognize at the same time. Hidden Markov models (HMMs) are able to do this,

which is one reason for their popularity for unconstrained handwriting [14], [15],

[16], [17], [18], [19]. The idea of applying HMMs to handwriting recognition was

originally motivated by their success in speech recognition [20], where a similar

conflict exists between recognition and segmentation. Over the years, numerous

refinements of the basic HMM approach have been proposed, such as the writer

independent system considered in [7], which combines point oriented and stroke

oriented input features.

However, HMMs have several well-known drawbacks. One of these is that they

assume the probability of each observation depends only on the current state,

which makes contextual effects difficult to model. Another is that HMMs are ge-

nerative, when discriminative models generally give better performance labeling

and classification tasks.

Recurrent neural networks (RNNs) do not suffer from these limitations, and

would therefore seem a promising alternative to HMMs. However the application

of RNNs alone to handwriting recognition have so far been limited to isolated cha-

racter recognition (e.g. [21]). One reason for this is that the traditional neural net-

work objective functions require a separate training signal for every point in the

input sequence, which in turn requires presegmented data.

A more successful use of neural networks for handwriting recognition has been

to combine them with HMMs in the so-called hybrid approach [22], [23]. A varie-

ty of network architectures have been tried for hybrid handwriting recognition, in-

cluding multilayer perceptrons [24], [25], time delay neural networks (TDNNs)

[18], [26], [27], and RNNs [28], [29], [30]. However, although hybrid models al-

leviate the difficulty of introducing context to HMMs, they still suffer from many

of the drawbacks of HMMs, and they do not realize the full potential of RNNs for

sequence modeling.

1.2 Contribution

This chapter describes a recently introduced alternative approach, in which a single

RNN is trained directly for sequence labeling. The network uses the connectionist

temporal classification (CTC) combined with bidirectional Long Short-Term

Memory (BLSTM) architecture, which provides access to long range input context

in both directions. A further enhancement which allows the network to work in

multiple dimensions will be presented in this chapter. The so-called Multidimem-

sional LSTM (MDLSTM) is very successful even on raw pixel data.

The rest of this Chapter is organized as follows. Section 2 presents the

handwritten data and the feature extraction techniques. Subsequently, Section 3

describes the novel neural network classifier. Experimental results are presented in

Section 4. Finally, Section 5 concludes this chapter.

Search WWH ::

Custom Search

Home