Information Technology Reference
In-Depth Information
The conditional probability
p
(
π
|
x
)
through
the lattice of label observations is then found by multiplying together the label and
blank probabilities at every time step:
of observing a particular path
π
T
T
t
p
(
π
|
x
)
=
p
(
π
,
t
|
x
)
=
y
,
t
π
t
t
=
1
t
=
1
π
where
is the label observed at time t along path
π
.
t
T
L
T
l
L
Paths are mapped onto label sequences
, where
denotes the set of
T
all strings on the alphabet L of length
, by an operator B that removes first the
repeated labels, then the blanks. For example, both
B
(
a
,
,
a
,
b
,
)
and
B
(
,
a
,
a
,
,
,
a
,
b
,
b
)
(
a
,
a
,
b
)
yield the labeling
. Since the paths are mutually
T
l
L
exclusive, the conditional probability of a given labelling
is the sum of the
probabilities of all the paths corresponding to it:
p
(
l
|
x
)
=
p
(
π
|
x
)
1
π
B
(
l
)
The above step is what allows the network to be trained with unsegmented data.
The intuition is that, because we don't know where the labels within a particular
transcription will occur, we sum over all the places where they could occur.
In general, a large number of paths will correspond to the same label sequence,
so a naïve calculation of the equation above is unfeasible. However, it can be effi-
ciently evaluated using a graph-based algorithm, similar to the forward-backward
algorithm for HMMs. More details about the CTC forward-backward algorithm
appear in [39].
3.5 Multidimensional Recurrent Neural Networks
Ordinary RNNs are designed for time-series and other data with a single spatio-
temporal dimension. However the benefits of RNNs (such as robustness to input
distortion, and flexible use of surrounding context) are also advantageous for mul-
tidimensional data, such as images and video sequences.
Multidimensional recurrent neural networks (MDRNNs) [43, 34], a special case
of Directed Acyclic Graph RNNs [44], generalize the basic structure of RNNs to
multidimensional data. Rather than having a single recurrent connection,
MDRNNs have as many recurrent connections as there are spatio-temporal dimen-
sions in the data. This allows them to access previous context information along
all input directions.
Multidirectional MDRNNs are the generalization of bidirectional RNNs to mul-
tiple dimensions. For an n-dimensional data sequence, 2 n different hidden layers
are used to scan through the data in all directions. As with bidirectional RNNs, all
Search WWH ::




Custom Search