Be at Odds? Deep and Hierarchical Neural Networks for Classification and Regression of Conflict in Speech - Conflict and Multimodal Communication: Social Research and Machine Intelligence

Information Technology Reference

In-Depth Information

The input gate activation i.t/ at time t is computed by applying the (non-linear)

input gate activation function g ig ./ on its inputs as

i.t/ D g ig W ix x.t/C W ih h.t 1/ C W ic c.t 1/ C b i ;

(19.16)

where W ix , W ih , and W ic correspond to the weight matrices which project the input

x.t/ , all (hidden) memory block outputs h.t 1/ and the internal cell states c.t 1/

from the previous time step, respectively, to the input gate; b i denotes the input

gate bias. Usually, the input gate activation function g ig is chosen to be the sigmoid

function ( 19.4 ). The activation i.t/ of the input gate multiplies the input to all cells

in the memory block, and thus determines which activity patterns are stored (added)

into it. During training, the input gate learns to open ( i.t/ 1 )soastostorerelevant

inputs in the memory block, respectively close ( i.t/ 0 )soastoshielditfrom

irrelevant ones.

Similarly, the activations of the forget gates f.t/ can be calculated as

f.t/D g fg W fx x.t/C W fh h.t 1/ C W fc c.t 1/ C b f ;

(19.17)

where g fg is commonly chosen to be the tanh activation function.

To determine the current state of a cell c.t/ , we scale the previous state c.t 1/

by the activation of the forget gate f.t/ and the cell input activations g ci by the

activation of the input gate i.t/ :

c.t/ D f.t/c.t 1/ C i.t/g ci W cx x.t/C W ch h.t 1/ C b c ;

(19.18)

where g ci is a logistic sigmoid function with range [0;1]. At t D 0 , the cell state of a

memory cell is initialized to zero, i.e. c.0/ D 0 . Subsequently, the cell accumulates

a sum, discounted by the forget gate, over its input. Hence, activity circulates in the

cell c.t/ as long as the forget gate remains open ( f.t/ 1 ). Just as the input gate

learns what to store in the memory block, the forget gate learns for how long to retain

the information, and—once it is outdated—to erase it by resetting the cell state to

zero. This prevents the cell state from growing to infinity and enables the memory

block to store new data without undue interference from prior operations (Gers et al.

2002 ).

The computation of the output gate activations o.t/ follows the same principle

as the calculation of the input gate activation. However, in this case the current cell

states c.t/ are considered, rather than the states from the previous time step:

o.t/ D g og W ox x.t/C W oh h.t 1/ C W oc c.t/ C b o

(19.19)

Here, g og denotes the output gate activation function, which is typically chosen to

be the sigmoid function as for the input gate.

Search WWH ::

Custom Search

Home