Digital Signal Processing Reference
In-Depth Information
w
i
,
j
corresponds to the weight of the connection from unit
i
to unit
j
while 'in', 'for', and 'out' refer to input gate, forget gate, and output gate,
respectively (cf. Eqs.
7.46
and
7.50
). Indices
i
,
h
, and
c
count the inputs
x
i
,
t
, the cell
outputs from other blocks in the hidden layer, and the memory cells, while
I
,
H
, and
C
are the number of inputs, the number of cells in the hidden layer, and the number
of memory cells in one block. Finally,
s
c
,
t
corresponds to the
state
of a cell
c
at time
t
, meaning the activation of the linear cell unit.
Similarly, the activation of the forget gates before and after applying
T
g
can be
calculated as follows:
respectively. The variable
I
H
C
α
for
,
t
=
1
w
i
,
for
x
i
,
t
+
1
w
h
,
for
β
h
,
t
−
1
+
1
w
c
,
for
s
c
,
t
−
1
(7.46)
i
=
h
=
c
=
β
for
,
t
=
T
g
(α
for
,
t
).
(7.47)
The memory cell value
α
c
,
t
is a weighted sum of inputs at time
t
and hidden unit
activations at time
t
−
1:
I
H
α
c
,
t
=
1
w
i
,
c
x
i
,
t
+
1
w
h
,
c
β
h
,
t
−
1
.
(7.48)
i
=
h
=
To determine the current state of a cell
c
, the previous state is scaled by the activation
of the forget gate and the input
T
i
(α
c
,
t
)
by the activation of the input gate:
s
c
,
t
=
β
for
,
t
s
c
,
t
−
1
+
β
in
,
t
T
i
(α
c
,
t
).
(7.49)
The computation of the output gate activations follows the same principle as the
calculation of the input and forget gate activations, however, this time the
current
state
s
c
,
t
is considered, rather than the state from the previous time step:
I
H
C
α
out
,
t
=
1
w
i
,
out
x
i
,
t
+
1
w
h
,
out
β
h
,
t
−
1
+
1
w
c
,
out
s
c
,
t
(7.50)
i
=
h
=
c
=
β
out
,
t
=
T
g
(α
out
,
t
).
(7.51)
Finally, the memory cell output is determined as
β
c
,
t
=
β
out
,
t
T
o
(
s
c
,
t
).
(7.52)
Note that the initial version of the LSTM architecture contained only input and output
gates. Forget gates were added later [
26
] in order to allow the memory cells to reset
themselves whenever the network needs to
forget
past inputs.