Error-Driven Task Learning - Computational Explorations in Cognitive Neuroscience

Information Technology Reference

In-Depth Information

the sending units:

Thus, the introduction of the sigmoid is offset by the use

of the CE error function, so that the weights are adapted

in the same way as when a linear activation function is

used.

(5.12)

It is sometimes convenient to write the sigmoid function

as ( k ) ,wherethe is intended to evoke the sigmoidal

character of the function:

5.4.2

Soft Weight Bounding

The second problem, the unbounded nature of the error-

driven weights, is incompatible with both the facts of

biology and the point-neuron activation function, which

requires a separation between excitation and inhibition.

Therefore, we use the following mechanism for bound-

ing the error-driven weights (note that this does not ap-

ply to the bias weights, which have no such sign con-

straints):

(5.13)

Now, we can take the derivative of the CE error func-

tion with respect to the weight, again using the chain

rule. To deal with the more complex sigmoidal activa-

tion function, we have to extend the chain rule to in-

clude a separate step for the derivative of the activation

function with respect to its net input do k

, and then the

(5.19)

net input with respect to the weight

(1 w

where ik is the weight change computed by the error-

driven algorithm (e.g., equation 5.9), and the [x] + oper-

ator returns x if x>0 and 0 otherwise, while [x] does

the opposite, returning x if x< 0 ,and 0 otherwise.

Equation 5.19 imposes the same kind of soft weight

bounding that the CPCA algorithm has naturally, where

the weights approach the bounds of 1 and 0 exponen-

tially slowly (softly). Note that this equation has the

same general form as the expanded form of the CPCA

Hebbian weight update rule (equation 4.17), and also

the equation for updating the membrane potential from

chapter 2. Thus, like these other functions, it provides

a natural interpretation of weight values centered on the

middle value of .5.

For example, when there is a series of individual

weight changes of equal magnitude but opposite sign,

the weight will hover around .5, which corresponds well

with the Hebbian interpretation of .5 as reflecting lack

of positive or negative correlation. Similarly, as positive

weight increases outweigh negative ones, the weight

value increases proportionally, and likewise decreases

proportionally for more negative changes than positive.

Weight bounding is appealing from a biological per-

spective because it is clear that synaptic efficacy has

limits. We know that synapses do not change their

sign, so they must be bounded at the lower end by zero.

The upper bound is probably determined by such things

as the maximal amount of neurotransmitter that can

(5.14)

Again, we will break this out into the component terms:

(5.15)

and 1

(5.16)

(

)= o

(1 o

Most readers need not derive this derivative them-

selves; it is somewhat involved. Finally, the derivative

of the net input term is just like that of the linear activa-

tion from before:

(5.17)

When we put these terms all together, the denomina-

tor of the first term cancels with the derivative of the sig-

moid function, resulting in exactly the same delta rule

we had before:

(5.18)

1 Note here that we can use the

(prime) notation to indicate the

derivative of the sigmoid function. We use d to indicate a simple

derivative, because this equation has a single variable ( k ), in contrast

with most other cases where we use @ to indicate a partial derivative

(i.e., with respect to only one of multiple variables).

Computational Explorations in Cognitive Neuroscience

Search WWH ::

Custom Search

Home