Error-Driven Task Learning - Computational Explorations in Cognitive Neuroscience

Information Technology Reference

In-Depth Information

where t k is again the target value (not to be confused

with the event index t ), and o k is the actual output

activation, and both are implicitly functions of time

(event) t .

Equation 5.2 will be zero when the outputs exactly

match the targets for all events in the environment or

training set , and larger values will reflect worse perfor-

mance. The goal of task learning can thus be cast as that

of minimizing this error measure (also known as gradi-

ent descent in error). We refer to this as error-driven

learning. In this context, SSE (equation 5.2) serves as

an objective function for error-driven learning, in that

it specifies the objective of learning.

One standard and rather direct way to minimize any

function is to first take its derivative with respect to the

free parameters. The derivative gives you the slope of

the function, or how the function changes with changes

to the free parameters. For example:

. . .

Figure 5.4: Illustration of the credit assignment process,

where the activity of a unit is represented by how bright it

is. a) If the output unit was not very active and it should have

been more so (i.e., t k o k is positive), then the weights will

all increase, but in proportion to the activity of the sending

units (because the most active sending units can do the most

good). b) The same principle holds when the output unit was

too active.

for the individual output unit (t k o k ) and the activation

of the sending unit s i . Thus, those sending units that are

more active when a big error is made will receive most

of the blame for this error. For example, if the output

unit was active and it shouldn't have been (i.e., (t k o k )

is negative), then weights will be decreased from those

input units which were active. On the other hand, if

the output unit wasn't active when it should have been,

then the weights will increase from those input units

that were active. Thus, the next time around, the unit's

activation should be closer to the target value, and hence

the error will be reduced.

This process of adjusting weights in proportion to

the sending unit activations is called credit assignment

(though a more appropriate name might be blame as-

signment), illustrated in figure 5.4. Credit assignment

is perhaps the most important computational property

of error-driven learning rules (i.e., on a similar level as

correlational learning for Hebbian learning rules).

One can view the representations formed by error-

driven learning as the result of a multiple credit satis-

faction mechanism — an integration of the synergies

and conflicts of the credit assignment process on each

input-output pattern over the entire training set. Thus,

instead of reflecting the strongest correlations, as Heb-

bian learning does, the weights here reflect the strongest

solutions to the task at hand (i.e., those solutions that

satisfy the most input-output mappings).

The derivative of a network's error with respect to

its weights indicates how the error changes as the

weights change.

Once this derivative has been computed, the network's

weights can then be adjusted to minimize the network's

errors. The derivative thus provides the basis for our

learning rule. We will work through exactly how to take

this derivative in a moment, but first we will present the

results to provide a sense of what the resulting learning

rule looks and acts like.

Taking the negative of the derivative of SSE with re-

spect to the weights, we get a weight update or learning

rule called the delta rule :

(5.3)

where s i is the input (stimulus) unit activation, and

is the learning rate as usual. This is also known as

least mean squares (LMS), and it has been around for

some time (Widrow & Hoff, 1960). Essentially the

same equation is used in the Rescorla-Wagner rule for

classical (Pavlovian) conditioning (Rescorla & Wagner,

1972).

It should make sense that this learning rule will adjust

the weights to reduce the error. Basically, it says that the

weights should change as a function of the local error

Computational Explorations in Cognitive Neuroscience

Search WWH ::

Custom Search

Home