Memory - Computational Explorations in Cognitive Neuroscience

Information Technology Reference

In-Depth Information

There are three different qualitative directions of AC

change (increase, no change, and decrease) that have

corresponding effects on the PFC. If the AC unit pre-

dicts that the current inputs will lead to future reward,

it will encode these inputs in the active maintenance

of the PFC. If there is no change in AC activity (i.e.,

no increase or decrease in expected rewards), then the

modulatory gain on the PFC inputs is low, so that the

information maintained in the PFC is undisturbed by

ongoing processing (e.g., the “ignore” stimuli). If there

is a decrease in dopamine (negative change in AC ac-

tivity), then the modulatory gain on the input weights

remains low, and the modulatory gain on the recurrent

weights that implement maintenance in the PFC is also

decreased, so that information in the PFC is deactivated.

As reviewed previously, there is at least suggestive evi-

dence for these kinds of effects of dopamine on the PFC.

The AC-PFC relationship is formalized in the model

with the following equations for the absolute weight

scale terms s in (the weight scaling of the PFC in-

puts) and s maint (the weight-scaling of the PFC self-

maintenance connections):

AC controls when the PFC gets updated by detecting

when a given stimulus is predictive of reward. At the

time of reward, the stimulus is represented in the PFC,

and the AC learns that this PFC representation is predic-

tive of reward. However, at the start of any subsequent

trials involving this rewarded stimulus, the stimulus is

only present in the input and not in the PFC, because the

inactive AC unit needs to be activated to get the stimu-

lus into the PFC. Thus, the PFC needs to be updated

with the current input for the AC to recognize the input

as predictive of reward, but the AC unit needs to recog-

nize the input as predictive of reward for the stimulus to

get into the PFC in the first place!

One possible solution to this problem would be to

ensure that the hidden layer always represents the con-

tents of the PFC (e.g., by having relatively strong top-

down projections). However, this is incompatible with

the need to use the hidden layer for ongoing processing

— in the present task, the hidden layer needs to repre-

sent the intervening “ignore” stimuli between the stor-

age event and the recall event to be able to produce the

correct outputs on these trials.

The solution that we have adopted is to generalize the

same dopaminergic gain modulation mechanism used

for the inputs to the PFC to the outputs from the PFC.

Thus, at the point when reward is received (or antic-

ipated), the PFC will drive the hidden units with suf-

ficient strength that they will reflect the information

stored in the PFC, and the AC will learn that these hid-

den representations are predictive of reward. Then, at

the start of a subsequent storage trial, the hidden units

will trigger the firing of the AC and enable the PFC to

be updated with the appropriate information. To imple-

ment this, we introduce a new dynamic weight-scaling

term, s out , that is updated just as in equation 9.2. Fur-

ther computational motivations for the use of output

gating can be found in

(9.1)

(9.2)

where Æ is the change in AC activation (see equa-

tion 6.5), and is a random noise value that allows

for random trial-and-error exploration during the ini-

tial phases of learning. The base-level parameters b in

and b maint determine the basic level of each weight-

scaling (gain) parameter, and are typically set to 0 and

1, respectively. Both of the weight-scaling terms are

bounded between 0 and 1.

As discussed in chapter 6, the temporal-difference al-

gorithm that is used to train the AC unit depends criti-

cally on having a representation of prior input informa-

tion at the point at which reward is received. It uses this

information to learn which stimuli occurring earlier in

time reliably predict reward later in time. An important

feature of the present model is that this prior input in-

formation is actually maintained in active memory (and

not via a contrivance like the CSC used in chapter 6), so

it should be available at the point of reward.

However, the fact that this information is maintained

in the PFC causes the following catch-22 problem. The

Hochreiter and Schmidhuber

(1997).

Another possible solution to the catch-22 problem

is to have the PFC always represent the current in-

put in addition to whatever information is being main-

tained. This way, the input-driven PFC representations

can directly trigger the AC system, which then fires

a dopamine signal to “lock in” those representations,

which are thus the same ones that will be active later

Computational Explorations in Cognitive Neuroscience

Search WWH ::

Custom Search

Home