Neural Networks for Handwriting Recognition - Computational Intelligence Paradigms in Advanced Pattern Classification

Information Technology Reference

In-Depth Information

3.3 Bidirectional Recurrent Neural Networks

For many tasks it is useful to have access to future as well as past context. In

handwriting recognition, for example, the identification of a given letter is helped

by knowing the letters both to the right and left of it. Bidirectional Recurrent

Neural Networks (BRNNs) [35] are able to access context in both directions along

the input sequence. BRNNs contain two separate hidden layers, one of which

processes the inputs forwards, while the other processes them backwards. Both

hidden layers are connected to the output layer, which therefore has access to all

past and future context of every point in the sequence.

Combining BRNNs and LSTM gives bidirectional LSTM (BLSTM) [42].

3.4 Connectionist Temporal Classification (CTC)

Standard RNN objective functions require a presegmented input sequence with a

separate target for every segment. This has limited the applicability of RNNs in

domains such as cursive handwriting recognition, where segmentation is difficult

to determine. Moreover, because the outputs of a standard RNN are a series of in-

dependent, local classifications, some form of post processing is required to trans-

form them into the desired label sequence. Connectionist Temporal Classification

(CTC) [36,34] is an RNN output layer specifically designed for sequence labeling

tasks. It does not require the data to be presegmented, and it directly outputs a

probability distribution over label sequences. CTC has been shown to outperform

RNN-HMM hybrids in a speech recognition task [36].

A CTC output layer contains as many units as there are labels in the task, plus

an additional 'blank' or 'no label' unit. The output activations are normalized (us-

ing the softmax function), so that they sum to 1 and are each in the range (0; 1):

t

k

a

e

t

k

y

=

,

 ′

k

a

e

′

k

t

a is the unsquashed activation of output unit k at time t , and

y is the ac-

where

tivation of the same unit after the softmax function is applied.

The above activations are used to estimate the conditional probabilities

)

p

(

k

,

t

|

x

of observing the label (or blank) with index k at time t in the input

sequence x :

y k

=

p

(

k

,

t

|

x

)

Computational Intelligence Paradigms in Advanced Pattern Classification

Search WWH ::

Custom Search

Home