Information Technology Reference
In-Depth Information
A more satisfactory solution to Sayre's paradox would be to segment and rec-
ognize at the same time. Hidden Markov models (HMMs) are able to do this,
which is one reason for their popularity for unconstrained handwriting [14], [15],
[16], [17], [18], [19]. The idea of applying HMMs to handwriting recognition was
originally motivated by their success in speech recognition [20], where a similar
conflict exists between recognition and segmentation. Over the years, numerous
refinements of the basic HMM approach have been proposed, such as the writer
independent system considered in [7], which combines point oriented and stroke
oriented input features.
However, HMMs have several well-known drawbacks. One of these is that they
assume the probability of each observation depends only on the current state,
which makes contextual effects difficult to model. Another is that HMMs are ge-
nerative, when discriminative models generally give better performance labeling
and classification tasks.
Recurrent neural networks (RNNs) do not suffer from these limitations, and
would therefore seem a promising alternative to HMMs. However the application
of RNNs alone to handwriting recognition have so far been limited to isolated cha-
racter recognition (e.g. [21]). One reason for this is that the traditional neural net-
work objective functions require a separate training signal for every point in the
input sequence, which in turn requires presegmented data.
A more successful use of neural networks for handwriting recognition has been
to combine them with HMMs in the so-called hybrid approach [22], [23]. A varie-
ty of network architectures have been tried for hybrid handwriting recognition, in-
cluding multilayer perceptrons [24], [25], time delay neural networks (TDNNs)
[18], [26], [27], and RNNs [28], [29], [30]. However, although hybrid models al-
leviate the difficulty of introducing context to HMMs, they still suffer from many
of the drawbacks of HMMs, and they do not realize the full potential of RNNs for
sequence modeling.
1.2 Contribution
This chapter describes a recently introduced alternative approach, in which a single
RNN is trained directly for sequence labeling. The network uses the connectionist
temporal classification (CTC) combined with bidirectional Long Short-Term
Memory (BLSTM) architecture, which provides access to long range input context
in both directions. A further enhancement which allows the network to work in
multiple dimensions will be presented in this chapter. The so-called Multidimem-
sional LSTM (MDLSTM) is very successful even on raw pixel data.
The rest of this Chapter is organized as follows. Section 2 presents the
handwritten data and the feature extraction techniques. Subsequently, Section 3
describes the novel neural network classifier. Experimental results are presented in
Section 4. Finally, Section 5 concludes this chapter.
Search WWH ::




Custom Search