Information Technology Reference
In-Depth Information
need to explicitly compute the derivative of the activa-
tion function. Instead, this derivative is implicitly com-
puted in the difference of the activation states.
Once we have the Æ terms for the hidden units com-
puted as the difference in activation across the two
phases, we end up with equation 5.32:
same in both directions) both preserves any existing
weight symmetry, and, when combined with a small
amount of weight decay (Hinton, 1989b) and/or soft
weight bounding, actually works to symmetrize initially
asymmetric weights. A simple way of preserving sym-
metry is to take the average of the weight updates for
the different weight directions:
(5.37)
￿ w
= ￿ ￿Æ
= ￿ ( h
￿ h
) s
Thus, through bidirectional connectivity and the ap-
proximation of the product of net input differences and
the derivative of the activation function, hidden units
implicitly compute the information needed to mini-
mize error as in backpropagation, but using only locally
available activity signals.
Finally, it should be noted that GeneRec is only an
approximation to the actual backpropagation procedure.
In a bidirectional network with potentially complex set-
tling dynamics, the propagation of the two phases of
activation values separately and the calculation of their
difference (GeneRec) is not guaranteed to be the same
as directly propagating the difference itself (backprop-
agation). However, the approximation holds up quite
well even in deep multilayered networks performing
complicated learning tasks (O'Reilly, 1996a).
￿ w
= ￿
( y
￿ y
( y
+ y
( x
￿ x
(5.39)
= ￿
￿ x
(where the 2 for averaging the weight updates in the two
different directions gets folded into the arbitrary learn-
ing rate constant ￿ ). Because many terms end up can-
celing, the weight change rule that results is a simple
function of the coproduct of the sending and receiving
activations in the plus phase, minus this coproduct in
the minus phase. The simplicity of this rule makes it
more plausible that the brain might have adopted such a
symmetry producing mechanism.
To summarize the mathematical side of things before
providing further biological, ecological, and psycholog-
ical support for something like the GeneRec algorithm
in the human cortex (which we address in the next sec-
tion): we use equation 5.39 to adjust the weights in the
network, subject additionally to the soft weight bound-
ing procedure described previously. The bias weight
update remains unaffected by the symmetry and mid-
point modifications, and is thus given by equation 5.33.
The learning rule in equation 5.39 provides an in-
teresting bridge to a set of other learning algorithms
in the field. Specifically, it is identical to the con-
trastive Hebbian learning (CHL) algorithm, which
is so named because it is the contrast (difference) be-
tween two Hebbian-like terms (the sender-receiver co-
products). We will therefore refer to equation 5.39 as
the CHL learning rule. In the following subsection, we
discuss the original derivation of CHL and other similar
algorithms.
5.7.2
Symmetry, Midpoint, and CHL
As we mentioned, we can improve upon equation 5.32
( ￿w ij = ￿(yj ￿ yj
) in two small but significant
ways. First, there is a more sophisticated way of updat-
ing weights, known as the midpoint method ,thatuses
the average of both the minus and plus phase activation
of the sending unit x i , instead of just the minus phase
alone (O'Reilly, 1996a):
(5.38)
Second, the mathematical derivation of the learning
rule depends on the weights being symmetric, and yet
the basic GeneRec equation is not symmetric (i.e., the
weight changes computed by unit j from unit i are not
the same as those computed by unit i from unit j ). So,
even if the weights started out symmetric, they would
not likely remain that way under the basic GeneRec
equation. Making the weight changes symmetric (the
Search WWH ::




Custom Search