Audio Recognition - Intelligent Audio Analysis

Digital Signal Processing Reference

In-Depth Information

w i , j corresponds to the weight of the connection from unit

i to unit j while 'in', 'for', and 'out' refer to input gate, forget gate, and output gate,

respectively (cf. Eqs. 7.46 and 7.50 ). Indices i , h , and c count the inputs x i , t , the cell

outputs from other blocks in the hidden layer, and the memory cells, while I , H , and

C are the number of inputs, the number of cells in the hidden layer, and the number

of memory cells in one block. Finally, s c , t corresponds to the state of a cell c at time

t , meaning the activation of the linear cell unit.

Similarly, the activation of the forget gates before and after applying T g can be

calculated as follows:

respectively. The variable

α for , t =

1 w i , for x i , t +

1 w h , for β h , t − 1 +

1 w c , for s c , t − 1

(7.46)

β for , t =

T g (α for , t ).

(7.47)

The memory cell value

α c , t is a weighted sum of inputs at time t and hidden unit

activations at time t

−

α c , t =

1 w i , c x i , t +

1 w h , c β h , t − 1 .

(7.48)

To determine the current state of a cell c , the previous state is scaled by the activation

of the forget gate and the input T i (α c , t )

by the activation of the input gate:

s c , t = β for , t s c , t − 1 + β in , t T i (α c , t ).

(7.49)

The computation of the output gate activations follows the same principle as the

calculation of the input and forget gate activations, however, this time the current

state s c , t is considered, rather than the state from the previous time step:

α out , t =

1 w i , out x i , t +

1 w h , out β h , t − 1 +

1 w c , out s c , t

(7.50)

β out , t =

T g (α out , t ).

(7.51)

Finally, the memory cell output is determined as

β c , t = β out , t T o (

s c , t ).

(7.52)

Note that the initial version of the LSTM architecture contained only input and output

gates. Forget gates were added later [ 26 ] in order to allow the memory cells to reset

themselves whenever the network needs to forget past inputs.

Intelligent Audio Analysis

Search WWH ::

Custom Search

Home