Information Technology Reference
In-Depth Information
the sending units:
Thus, the introduction of the sigmoid is offset by the use
of the CE error function, so that the weights are adapted
in the same way as when a linear activation function is
used.
(5.12)
It is sometimes convenient to write the sigmoid function
as ￿(￿ k ) ,wherethe ￿ is intended to evoke the sigmoidal
character of the function:
5.4.2
Soft Weight Bounding
The second problem, the unbounded nature of the error-
driven weights, is incompatible with both the facts of
biology and the point-neuron activation function, which
requires a separation between excitation and inhibition.
Therefore, we use the following mechanism for bound-
ing the error-driven weights (note that this does not ap-
ply to the bias weights, which have no such sign con-
straints):
(5.13)
￿ ￿
Now, we can take the derivative of the CE error func-
tion with respect to the weight, again using the chain
rule. To deal with the more complex sigmoidal activa-
tion function, we have to extend the chain rule to in-
clude a separate step for the derivative of the activation
function with respect to its net input do k
d￿
, and then the
(5.19)
net input with respect to the weight
:
￿ w
(1 ￿ w
where ￿ ik is the weight change computed by the error-
driven algorithm (e.g., equation 5.9), and the [x] + oper-
ator returns x if x>0 and 0 otherwise, while [x] ￿ does
the opposite, returning x if x< 0 ,and 0 otherwise.
Equation 5.19 imposes the same kind of soft weight
bounding that the CPCA algorithm has naturally, where
the weights approach the bounds of 1 and 0 exponen-
tially slowly (softly). Note that this equation has the
same general form as the expanded form of the CPCA
Hebbian weight update rule (equation 4.17), and also
the equation for updating the membrane potential from
chapter 2. Thus, like these other functions, it provides
a natural interpretation of weight values centered on the
middle value of .5.
For example, when there is a series of individual
weight changes of equal magnitude but opposite sign,
the weight will hover around .5, which corresponds well
with the Hebbian interpretation of .5 as reflecting lack
of positive or negative correlation. Similarly, as positive
weight increases outweigh negative ones, the weight
value increases proportionally, and likewise decreases
proportionally for more negative changes than positive.
Weight bounding is appealing from a biological per-
spective because it is clear that synaptic efficacy has
limits. We know that synapses do not change their
sign, so they must be bounded at the lower end by zero.
The upper bound is probably determined by such things
as the maximal amount of neurotransmitter that can
(5.14)
Again, we will break this out into the component terms:
(5.15)
and 1
(5.16)
= ￿
( ￿
)= o
(1 ￿ o
k
Most readers need not derive this derivative them-
selves; it is somewhat involved. Finally, the derivative
of the net input term is just like that of the linear activa-
tion from before:
(5.17)
When we put these terms all together, the denomina-
tor of the first term cancels with the derivative of the sig-
moid function, resulting in exactly the same delta rule
we had before:
(5.18)
1 Note here that we can use the
(prime) notation to indicate the
derivative of the sigmoid function. We use d to indicate a simple
derivative, because this equation has a single variable ( ￿k ), in contrast
with most other cases where we use @ to indicate a partial derivative
(i.e., with respect to only one of multiple variables).
Search WWH ::




Custom Search