Error-Driven Task Learning - Computational Explorations in Cognitive Neuroscience

Information Technology Reference

In-Depth Information

training the weights from the input to the hidden layer.

As usual, we will first present the net result of the

backpropagation algorithm — the actual learning rule

equations that are used to change the weights. Then

we will present the sequence of mathematical steps that

led to the learning rule in the next section. Thus, those

who lack the patience or mathematical background to

explore the derivation in the next section can neverthe-

less achieve some level of understanding of how the al-

gorithm works in this section. We do caution, however,

that much of the interesting material in this and sub-

sequent sections hinges on some of the details of the

derivation, so you will miss quite a bit if you skip it.

The weights at every layer in the network are adjusted

in backpropagation according to the following learning

rule:

o k = σ(η k

Output

. . .

h j

σ(η )

Hidden

. . .

h j

s i w ij

s i

Input

. . .

Targets

t k

o k

Output

. . .

∆

k − o k )

w jk

h j

Hidden

. . .

h j

∆

w ij =

s i

Input

. . .

(5.21)

where x i represents the activation of the sending unit

associated with the weight w ij being changed, Æ j is the

contribution of the receiving unit j toward the overall

error at the output layer (which we will define in a mo-

ment), and is the learning rate as usual. Perhaps the

most important thing to notice about this learning rule

is that it captures the basic credit assignment property

of the delta rule, by virtue of adjusting the weights in

proportion to the activation of the sending unit.

The weight update rule for bias weights can be de-

rived as usual by just setting the sending activation x j

to 1, so that equation 5.21 becomes:

Figure 5.8: Illustration of the basic processes in the stan-

dard backpropagation algorithm. a) Shows the feedforward

propagation of activations, with each layer using the same sig-

moidal logistic activation function () . b) Shows the error

backpropagation step, where the challenge is to figure out how

to update the weights from the input units to the hidden units,

because we already know how to adjust the hidden to output

weights using the delta rule.

Though it took some time before backpropagation

was rediscovered by Rumelhart, Hinton, and Williams

(1986a) (the idea had been independently discovered

several times before: Bryson & Ho, 1969; Werbos,

1974; Parker, 1985), this algorithm is really just a sim-

ple extension of the delta rule that continues the chain

rule down into the hidden units and their weights. To

see how this works, let's take the standard case of

a three-layered network (input, hidden, and output)

with feedforward connectivity and sigmoidal activation

functions, shown in figure 5.8. We will write the activa-

tion of an output unit as o k , that of a hidden unit as h j ,

and that of an input (or stimulus) unit as s i .

The first thing to note is that the weights from the

hidden units into the output units can be trained using

the simple delta rule as derived previously, because this

part of the network is just like a single layer network.

Thus, the real challenge solved by backpropagation is

(5.22)

Now, let's try to understand Æ . First, for the output

units, we know from the delta rule that:

(5.23)

and this accords well with the idea that Æ k is the con-

tribution of the output unit k to the overall error of the

network. The crux of backpropagation is computing Æj ,

the contribution of the hidden unit j to the overall net-

work error. We will see that this contribution can be

computed as:

(5.24)

( h

(1 h

Computational Explorations in Cognitive Neuroscience

Search WWH ::

Custom Search

Home