Neural Networks for Handwriting Recognition - Computational Intelligence Paradigms in Advanced Pattern Classification

Information Technology Reference

In-Depth Information

The conditional probability

(

)

through

the lattice of label observations is then found by multiplying together the label and

blank probabilities at every time step:

of observing a particular path

∏

(

)

(

)

where

is the label observed at time t along path

≤

L ≤

∈

Paths are mapped onto label sequences

, where

denotes the set of

≤

all strings on the alphabet L of length

, by an operator B that removes first the

repeated labels, then the blanks. For example, both

(

−

)

and

(

−

)

(

)

yield the labeling

. Since the paths are mutually

≤

∈

exclusive, the conditional probability of a given labelling

is the sum of the

probabilities of all the paths corresponding to it:

 −

∈

(

)

(

)

(

)

The above step is what allows the network to be trained with unsegmented data.

The intuition is that, because we don't know where the labels within a particular

transcription will occur, we sum over all the places where they could occur.

In general, a large number of paths will correspond to the same label sequence,

so a naïve calculation of the equation above is unfeasible. However, it can be effi-

ciently evaluated using a graph-based algorithm, similar to the forward-backward

algorithm for HMMs. More details about the CTC forward-backward algorithm

appear in [39].

3.5 Multidimensional Recurrent Neural Networks

Ordinary RNNs are designed for time-series and other data with a single spatio-

temporal dimension. However the benefits of RNNs (such as robustness to input

distortion, and flexible use of surrounding context) are also advantageous for mul-

tidimensional data, such as images and video sequences.

Multidimensional recurrent neural networks (MDRNNs) [43, 34], a special case

of Directed Acyclic Graph RNNs [44], generalize the basic structure of RNNs to

multidimensional data. Rather than having a single recurrent connection,

MDRNNs have as many recurrent connections as there are spatio-temporal dimen-

sions in the data. This allows them to access previous context information along

all input directions.

Multidirectional MDRNNs are the generalization of bidirectional RNNs to mul-

tiple dimensions. For an n-dimensional data sequence, 2 n different hidden layers

are used to scan through the data in all directions. As with bidirectional RNNs, all

Computational Intelligence Paradigms in Advanced Pattern Classification

Search WWH ::

Custom Search

Home