Information Technology Reference
In-Depth Information
3.3 Bidirectional Recurrent Neural Networks
For many tasks it is useful to have access to future as well as past context. In
handwriting recognition, for example, the identification of a given letter is helped
by knowing the letters both to the right and left of it. Bidirectional Recurrent
Neural Networks (BRNNs) [35] are able to access context in both directions along
the input sequence. BRNNs contain two separate hidden layers, one of which
processes the inputs forwards, while the other processes them backwards. Both
hidden layers are connected to the output layer, which therefore has access to all
past and future context of every point in the sequence.
Combining BRNNs and LSTM gives bidirectional LSTM (BLSTM) [42].
3.4 Connectionist Temporal Classification (CTC)
Standard RNN objective functions require a presegmented input sequence with a
separate target for every segment. This has limited the applicability of RNNs in
domains such as cursive handwriting recognition, where segmentation is difficult
to determine. Moreover, because the outputs of a standard RNN are a series of in-
dependent, local classifications, some form of post processing is required to trans-
form them into the desired label sequence. Connectionist Temporal Classification
(CTC) [36,34] is an RNN output layer specifically designed for sequence labeling
tasks. It does not require the data to be presegmented, and it directly outputs a
probability distribution over label sequences. CTC has been shown to outperform
RNN-HMM hybrids in a speech recognition task [36].
A CTC output layer contains as many units as there are labels in the task, plus
an additional 'blank' or 'no label' unit. The output activations are normalized (us-
ing the softmax function), so that they sum to 1 and are each in the range (0; 1):
t
k
a
e
t
k
y
=
,
k
a
e
k
t
t
a is the unsquashed activation of output unit k at time t , and
y is the ac-
where
tivation of the same unit after the softmax function is applied.
The above activations are used to estimate the conditional probabilities
)
p
(
k
,
t
|
x
of observing the label (or blank) with index k at time t in the input
sequence x :
y k
=
p
(
k
,
t
|
x
)
Search WWH ::




Custom Search