Be at Odds? Deep and Hierarchical Neural Networks for Classification and Regression of Conflict in Speech - Conflict and Multimodal Communication: Social Research and Machine Intelligence

Information Technology Reference

In-Depth Information

Finally, the memory block output h.t/ is computed by applying g co ./ , typically

the logistic sigmoid squashing function ( 19.4 ) in the range [0;1], to the cell state

c.t/ , multiplied by the output gate activation o.t/ :

h.t/ D o.t/g co c.t/

(19.20)

The output gate activation o.t/ modulates the output of the current memory block

and hence determines which activity patterns are transmitted to the other memory

blocks, and to itself, in time step t C 1 , effectively controlling read access to the

memory block.

When first proposed, LSTMs did not have any connections from either input,

output, or forget states to the cell states c.t/ , which they are supposed to control (cf.

dashed arrows in Fig. 19.3 ). Thus, each gate could only observe the memory block

output directly, which is close to zero as long as the output gate is closed. The same

problem occurs for multiple cells in a memory block—none of the gates have access

to the cells they control if the output gate is closed. This lack of information may

lead to sub-optimal network performance.

A solution to this problem was presented in Gers et al. ( 2002 ) with the

introduction of the so-called peepholes : these weighted connections from the cell

to all the gates in the memory block allow them to inspect the current cell state c.t/ ,

even when the output gate is closed. These peephole connections were found to be

necessary in order to obtain well-working network solutions.

19.3.2

Bidirectional LSTM

A shortcoming of standard RNNs is that they have access to past but not to

future context. A solution to this problem are bidirectional RNNs (Schuster and

Paliwal 1997 ). Here, two separate recurrent hidden layers are operating on the input

sequence in opposite directions, one in forward direction, the other in backward

direction. Both hidden layers are connected to the same output layer, thus providing

access to long-range context in both input directions. The amount of context infor-

mation that the network actually uses is learned during training, and does not have to

be specified beforehand. In BLSTMs the principle of bidirectional networks and the

LSTM idea are combined. Of course, by resorting to bidirectional networks true on-

line processing is impossible. This may be approximated by a truncated version of

BLSTM; however, in many applications it is sufficient to obtain an output at the end

of an utterance so that both passes, forward and backward, can be used fully during

decoding.

Search WWH ::

Custom Search

Home