Information Technology Reference
In-Depth Information
where X t is the overall net input at timestamp t and W h is the weight matrix of the
hidden layer. Note that for clarity reasons we use this abbreviated form of the
complex formula, where the input weights do not directly appear (all is hidden in
the function g(…) ). This formula reveals that the influence of earlier time stamps
t-n vanishes rapidly, as the time difference n appears in the exponent of the weight
matrix. Since all values of the weight matrix W h are smaller than 1, the n -th power
of W h is close to zero.
Introducing the LSTM cell brings in three new cells which all get the weighted
sum of the outputs of the hidden layer in the previous timestamp as an input, i.e.,
for the input gate:
t
t
t
1
t
1
a
=
W
X
+
W
B
+
w
s
ι
i
,
ι
h
,
ι
c
,
ι
c
where s c t-1 is the cell state of the previous timestamp and W i,t and W h,t are the
weights for the current net input and the hidden layer output of the previous time-
stamp, respectively. The activation of the forget gate is:
a
t
=
W
X
t
+
W
B
t
1
+
w
s
t
1
θ
i
,
θ
h
,
θ
c
,
θ
c
which is the same formula just with other weights (those trained for the forget
gate). The cell activation is usually calculated by:
a
t
c
=
W
X
t
+
W
B
t
1
i
,
c
h
,
c
However, the cell state is then weighted with the outputs of the two gate cells:
t
c
t
t
c
t
t
c
1
s
=
σ
(
a
)
g
(
a
)
+
σ
(
a
)
s
ι
θ
where σ indicates that the sigmoid function is used as a squashing function for the
gates and g() is cell's activation function. As the sigmoid function often returns a
value close to zero or one, the formula can be interpreted as:
t
c
t
c
t
c
1
s
=
[
0
or
1
g
(
a
)
+
[
0
or
1
s
or in words: the cell state is either depending on the input activation (if the input
gates opens, i.e., the first weight is close to 1) or on the previous cell state (if the
forget gate opens, i.e., the second weight is close to one). This particular property
enables the LSTM-cell to bridge over long time periods. The value of the output
gate is calculated similarly to the other gates, i.e.:
t
t
t
1
t
a
=
W
X
+
W
B
+
w
s
ω
i
,
ω
h
,
ω
c
,
ω
c
and the final cell output is:
b
t
c
=
σ
(
a
t
)
h
(
s
t
c
)
ω
which again is close to zero or the usual output of the cell h(…) .
Search WWH ::




Custom Search