Information Technology Reference
In-Depth Information
with direct and weights with buffered access. All units are updated in each time step
in top-down order. Only units that have been updated earlier are sources of direct
access weights.
The computation done in the RNN is equivalent to the one of a FFNN that has
been constructed by unfolding the RNN in time. Part (b) of the figure shows the
unfolded network. The units and the direct-access weights were copied for each
time step. The buffered-access weights w ij now connect unit i in time step t with
unit j in step ( t + 1) .
Since the unfolded network is a directed acyclic graph, the error backpropa-
gation technique can be applied. It propagates error components in reverse update
order and hence backwards in time. Two simple modifications to the generic back-
propagation algorithm are necessary for BPTT.
First, the output units of FFNNs used to be sinks of the graph, with no units
accessing their activity. This is different in RNNs, where the activity of output units
is fed back into the network. Hence, the error component δ ( i,t )
k
of an output unit k
for example i not only depends on the direct contribution e ( i,t )
k
= γ t ( o ( i,t )
k
y ( i,t )
k
)
P
l w kl δ ( i,t )
from the cost function (6.14) for time t , but the backpropagated error
l
arriving from nodes l accessing it must also be considered. The source time t
for
the error components is either the same step t if the unit is accessed directly or the
next step ( t + 1) for buffered access. Both contributions must be added before the
combined error can be multiplied with the derivative of the transfer function f k :
!
X
df k
( i,t )
k
w kl δ ( i,t )
l
δ ( i,t )
k
e ( i,t )
k
=
+
(6.15)
,
l
where ξ ( i,t k denotes the net activity of unit k for example i at time t .
The second modification needed for BPTT was already used for shared weights
in FFNNs. BPTT produces additional weight sharing because a weight is replicated
for each time step. As before, the weight updates computed for the individual weight
instances must be added to compute the update for the shared weight.
Since BPTT propagates the error backwards through time until it reaches the
initial time step t = 0 , it can not only be used to adapt the weights of the network,
but also to modify the initial activities of the units.
6.3.2 Real-Time Recurrent Learning
The BPTT algorithm, presented above, is very efficient, requiring only O (1) oper-
ations per weight instance, but it is a batch-method that needs to store the entire
history of the recurrent computation for the error backpropagation.
Williams and Zipser [241] proposed computing the gradient of the cost func-
tion (6.14) using forward propagation. The resulting algorithm is called real-time
recurrent learning (RTRL).
RTRL maintains quantities π ( i,t )
jkl
∂o ( i,t )
j
∂w ( i,t )
kl
=
that represent the sensitivity of a unit
j with respect to a weight from unit k to unit l . They are initialized to zero for t = 0 :
Search WWH ::




Custom Search