Information Technology Reference
In-Depth Information
a)
training the weights from the input to the hidden layer.
As usual, we will first present the net result of the
backpropagation algorithm — the actual learning rule
equations that are used to change the weights. Then
we will present the sequence of mathematical steps that
led to the learning rule in the next section. Thus, those
who lack the patience or mathematical background to
explore the derivation in the next section can neverthe-
less achieve some level of understanding of how the al-
gorithm works in this section. We do caution, however,
that much of the interesting material in this and sub-
sequent sections hinges on some of the details of the
derivation, so you will miss quite a bit if you skip it.
The weights at every layer in the network are adjusted
in backpropagation according to the following learning
rule:
o k = σ(η k
Σ
Output
. . .
η
k
h j
w
jk
σ(η )
Hidden
. . .
h j
=
j
η
Σ
=
s i w ij
s i
Input
. . .
b)
Targets
t k
o k
Output
. . .
k o k )
w jk
=
(t
h j
Hidden
. . .
h j
w ij =
?
s i
Input
. . .
(5.21)
where x i represents the activation of the sending unit
associated with the weight w ij being changed, Æ j is the
contribution of the receiving unit j toward the overall
error at the output layer (which we will define in a mo-
ment), and ￿ is the learning rate as usual. Perhaps the
most important thing to notice about this learning rule
is that it captures the basic credit assignment property
of the delta rule, by virtue of adjusting the weights in
proportion to the activation of the sending unit.
The weight update rule for bias weights can be de-
rived as usual by just setting the sending activation x j
to 1, so that equation 5.21 becomes:
Figure 5.8: Illustration of the basic processes in the stan-
dard backpropagation algorithm. a) Shows the feedforward
propagation of activations, with each layer using the same sig-
moidal logistic activation function ￿(￿) . b) Shows the error
backpropagation step, where the challenge is to figure out how
to update the weights from the input units to the hidden units,
because we already know how to adjust the hidden to output
weights using the delta rule.
Though it took some time before backpropagation
was rediscovered by Rumelhart, Hinton, and Williams
(1986a) (the idea had been independently discovered
several times before: Bryson & Ho, 1969; Werbos,
1974; Parker, 1985), this algorithm is really just a sim-
ple extension of the delta rule that continues the chain
rule down into the hidden units and their weights. To
see how this works, let's take the standard case of
a three-layered network (input, hidden, and output)
with feedforward connectivity and sigmoidal activation
functions, shown in figure 5.8. We will write the activa-
tion of an output unit as o k , that of a hidden unit as h j ,
and that of an input (or stimulus) unit as s i .
The first thing to note is that the weights from the
hidden units into the output units can be trained using
the simple delta rule as derived previously, because this
part of the network is just like a single layer network.
Thus, the real challenge solved by backpropagation is
(5.22)
Now, let's try to understand Æ . First, for the output
units, we know from the delta rule that:
(5.23)
and this accords well with the idea that Æ k is the con-
tribution of the output unit k to the overall error of the
network. The crux of backpropagation is computing Æj ,
the contribution of the hidden unit j to the overall net-
work error. We will see that this contribution can be
computed as:
(5.24)
( h
(1 ￿ h
Search WWH ::




Custom Search