Memory - Computational Explorations in Cognitive Neuroscience

Information Technology Reference

In-Depth Information

The next trial should be an “ignore” trial, where the I

input is activated in conjunction with another stimulus

unit.

on some occasions, the contents of a store trial will be

encoded in the PFC, and the network will produce the

correct output, just by chance. When this happens, the

weights into the AC unit are incremented for those units

that were active. Let's watch the progress of learning.

Step through the three phases of this trial and those

of any subsequent “ignore” trials, until you see the R in-

put unit activated.

This signals a recall trial. In the minus phase of

this trial, the network should have activated the out-

put unit of the stimulus that was originally stored. Be-

cause it was not stored (and because the Hidden2 layer

wouldn't know how to produce the right activation even

if it was), the output unit is likely wrong.

, !

Do View , EPOCH_LOG to open a log for monitoring

learning performance. Then, press r.wt and select the

AC unit, so we can watch its weights learn. Then, do

Run and watch the weights.

You will see them increment slowly — notice that,

after an initial increase on the R units, the weight from

the S units in both the hidden and PFC layers incre-

ment the most. This is because although different stim-

uli can be stored, the S cue is always going to be present

on a successful storage trial, and will have its weights

incremented proportionally often. As the Hidden2

layer gets better at interpreting the (randomly) stored

PFC representations, correct outputs on the recall trials

will be produced more frequently, and the frequency of

weight updates will increase.

After every epoch of 25 store-ignore-recall se-

quences, the graph log will update. The red line shows

the standard sum-squared-error for the outputs the net-

work produces. The yellow line shows a count of the

number of “negative” rewards given to the network on

recall trials (i.e., trials where the correct output was not

produced). Thus, both of these error measures will be

high at the start, but should decrease as the network's

performance improves.

After continued training, something important hap-

pens — the weights from the S (store) unit in the

Hidden layer to the AC unit get strong enough to reli-

ably activate the AC unit when the store unit is active.

This means that the stimulus information will be reli-

ably stored in active maintenance in the PFC on each

trial, and this will clearly lead to even better perfor-

mance and tuning of both the AC unit and Hidden2

layer weights. This improvement is reflected in the

yellow line of the graph log plot, which rapidly accel-

erates downwards, and should reach zero within 6 or

so epochs. Training will stop automatically after two

epochs of perfect recall trial performance.

Now, let's examine the trained network's perfor-

mance in detail.

, !

Step into the first plus phase.

The AC unit will have a reward value clamped on it

— in this case, the reward will be 0 (actually a very

small number, 1e 12 , as an actual 0 indicates the ab-

sence of any reward information at all) because it did

not get the correct output. Thus, not much progress was

made on this sequence, but it does provide a concrete

instantiation of the task.

Continue to Step through trials until you see a cou-

ple of cases where the random gating noise causes the

network to update the PFC representations (i.e., you see

the PFC units become activated).

This can happen on any type of trial. This random ex-

ploration of what information to hold on to is an essen-

tial characteristic of this model, and of reinforcement-

based learning mechanisms in general. We will discuss

this aspect of the model later.

Note that after the network receives an external re-

ward signal, it will automatically reset the AC unit, and

this will deactivate any active PFC representations. This

is due to the absorbing reward mechanism discussed in

chapter 6. This resetting mechanism is obviously nec-

essary to allow the network to gate in new information

on subsequent trials. We know from neural recording

studies that PFC active memory representations are de-

activated just after they are needed (e.g., Fuster, 1989;

Goldman-Rakic, 1987), but exactly how this deactiva-

tion takes place at a biological level is not known. This

model would suggest that it might be a reflection of the

absorbing reward mechanism.

The initial phase of training in the network serves

mainly to train up the Hidden2 layer to produce the

correct output on the store and ignore trials. However,

Computational Explorations in Cognitive Neuroscience

Search WWH ::

Custom Search

Home