Information Technology Reference
In-Depth Information
The gradient-based learning algorithm of LSTM presented in [103] can be
modified such that the empirical MMSE risk functional is replaced by H R 2 .
The change in the derivation presented in [103] occurs in the following ex-
pression (the backpropagation error seen at the output neuron k ):
E k = f ( net k )( T k
Y k ) ,
(6.25)
where f (
) is the sigmoid transfer function, net k is the activation of the
output neuron k at time τ , T k is the target variable for the output neuron k
at time τ and Y k is the output of neuron k at time τ .Theterm( T k
·
Y k )
n i =1 ( t i
y i ) 2 ,
w.r.t. the output y k . This same derivative is computed now, using expression
(6.24). Note that, since the logarithm in (6.24) is a monotonically increasing
function, to minimize it is the same as to minimize its operand. So, the partial
derivative of the operand will be derived, which is
1
in equation (6.25) comes from the derivative of the MSE,
∂y k exp
=
n
n
e j ) 2
2 h 2
1
n 2 h 2 π
( e i
i =1
j =1
∂y k exp
.
n
n
t j + y j ) 2
2 h 2
1
n 2 h 2 π
( t i
y i
=
(6.26)
i =1
j =1
Now, when i = k the derivative of the term inside the summation becomes
exp
2( t k
( t k
y k
t j + y j ) 2
2 h 2
1
2 h 2
y k
t j + y j )(
1) .
(6.27)
Likewise, if j = k , the derivative becomes
exp
2( t i
( t i
y i
t k + y k
2 h 2
1
2 h 2
y i
t k + y k ) .
(6.28)
Expressions (6.27) and (6.28) yield the same values allowing writing the
derivative of the operand of (6.24) as
exp
( t i
n
t k + y k ) 2
2 h 2
( t i
y i
Q
y i
t k + y k ) ,
(6.29)
i =1
where
2
n 2 h 3 2 π
Q =
.
So expression (6.25) becomes
exp
2 h 2 a ik ,
n
a ik
E k = Qf k ( net k )
(6.30)
i =1
 
Search WWH ::




Custom Search