Information Technology Reference
In-Depth Information
Finally, the memory block output h.t/ is computed by applying g co ./ , typically
the logistic sigmoid squashing function ( 19.4 ) in the range [0;1], to the cell state
c.t/ , multiplied by the output gate activation o.t/ :
h.t/ D o.t/g co c.t/
(19.20)
The output gate activation o.t/ modulates the output of the current memory block
and hence determines which activity patterns are transmitted to the other memory
blocks, and to itself, in time step t C 1 , effectively controlling read access to the
memory block.
When first proposed, LSTMs did not have any connections from either input,
output, or forget states to the cell states c.t/ , which they are supposed to control (cf.
dashed arrows in Fig. 19.3 ). Thus, each gate could only observe the memory block
output directly, which is close to zero as long as the output gate is closed. The same
problem occurs for multiple cells in a memory block—none of the gates have access
to the cells they control if the output gate is closed. This lack of information may
lead to sub-optimal network performance.
A solution to this problem was presented in Gers et al. ( 2002 ) with the
introduction of the so-called peepholes : these weighted connections from the cell
to all the gates in the memory block allow them to inspect the current cell state c.t/ ,
even when the output gate is closed. These peephole connections were found to be
necessary in order to obtain well-working network solutions.
19.3.2
Bidirectional LSTM
A shortcoming of standard RNNs is that they have access to past but not to
future context. A solution to this problem are bidirectional RNNs (Schuster and
Paliwal 1997 ). Here, two separate recurrent hidden layers are operating on the input
sequence in opposite directions, one in forward direction, the other in backward
direction. Both hidden layers are connected to the same output layer, thus providing
access to long-range context in both input directions. The amount of context infor-
mation that the network actually uses is learned during training, and does not have to
be specified beforehand. In BLSTMs the principle of bidirectional networks and the
LSTM idea are combined. Of course, by resorting to bidirectional networks true on-
line processing is impossible. This may be approximated by a truncated version of
BLSTM; however, in many applications it is sufficient to obtain an output at the end
of an utterance so that both passes, forward and backward, can be used fully during
decoding.
Search WWH ::




Custom Search