Supervised Learning - Hierarchical Neural Networks for Image Interpretation

Information Technology Reference

In-Depth Information

with direct and weights with buffered access. All units are updated in each time step

in top-down order. Only units that have been updated earlier are sources of direct

access weights.

The computation done in the RNN is equivalent to the one of a FFNN that has

been constructed by unfolding the RNN in time. Part (b) of the figure shows the

unfolded network. The units and the direct-access weights were copied for each

time step. The buffered-access weights w ij now connect unit i in time step t with

unit j in step ( t + 1) .

Since the unfolded network is a directed acyclic graph, the error backpropa-

gation technique can be applied. It propagates error components in reverse update

order and hence backwards in time. Two simple modifications to the generic back-

propagation algorithm are necessary for BPTT.

First, the output units of FFNNs used to be sinks of the graph, with no units

accessing their activity. This is different in RNNs, where the activity of output units

is fed back into the network. Hence, the error component δ ( i,t )

of an output unit k

for example i not only depends on the direct contribution e ( i,t )

= γ t ( o ( i,t )

− y ( i,t )

)

l w kl δ ( i,t ∗ )

from the cost function (6.14) for time t , but the backpropagated error

arriving from nodes l accessing it must also be considered. The source time t ∗

for

the error components is either the same step t if the unit is accessed directly or the

next step ( t + 1) for buffered access. Both contributions must be added before the

combined error can be multiplied with the derivative of the transfer function f k :

df k

dξ ( i,t )

w kl δ ( i,t ∗ )

δ ( i,t )

e ( i,t )

(6.15)

where ξ ( i,t k denotes the net activity of unit k for example i at time t .

The second modification needed for BPTT was already used for shared weights

in FFNNs. BPTT produces additional weight sharing because a weight is replicated

for each time step. As before, the weight updates computed for the individual weight

instances must be added to compute the update for the shared weight.

Since BPTT propagates the error backwards through time until it reaches the

initial time step t = 0 , it can not only be used to adapt the weights of the network,

but also to modify the initial activities of the units.

6.3.2 Real-Time Recurrent Learning

The BPTT algorithm, presented above, is very efficient, requiring only O (1) oper-

ations per weight instance, but it is a batch-method that needs to store the entire

history of the recurrent computation for the error backpropagation.

Williams and Zipser [241] proposed computing the gradient of the cost func-

tion (6.14) using forward propagation. The resulting algorithm is called real-time

recurrent learning (RTRL).

RTRL maintains quantities π ( i,t )

jkl

∂o ( i,t )

∂w ( i,t )

that represent the sensitivity of a unit

j with respect to a weight from unit k to unit l . They are initialized to zero for t = 0 :

Hierarchical Neural Networks for Image Interpretation

Search WWH ::

Custom Search

Home