Information Technology Reference
In-Depth Information
The next trial should be an “ignore” trial, where the I
input is activated in conjunction with another stimulus
unit.
on some occasions, the contents of a store trial will be
encoded in the PFC, and the network will produce the
correct output, just by chance. When this happens, the
weights into the AC unit are incremented for those units
that were active. Let's watch the progress of learning.
Step through the three phases of this trial and those
of any subsequent “ignore” trials, until you see the R in-
put unit activated.
This signals a recall trial. In the minus phase of
this trial, the network should have activated the out-
put unit of the stimulus that was originally stored. Be-
cause it was not stored (and because the Hidden2 layer
wouldn't know how to produce the right activation even
if it was), the output unit is likely wrong.
, !
Do View , EPOCH_LOG to open a log for monitoring
learning performance. Then, press r.wt and select the
AC unit, so we can watch its weights learn. Then, do
Run and watch the weights.
You will see them increment slowly — notice that,
after an initial increase on the R units, the weight from
the S units in both the hidden and PFC layers incre-
ment the most. This is because although different stim-
uli can be stored, the S cue is always going to be present
on a successful storage trial, and will have its weights
incremented proportionally often. As the Hidden2
layer gets better at interpreting the (randomly) stored
PFC representations, correct outputs on the recall trials
will be produced more frequently, and the frequency of
weight updates will increase.
After every epoch of 25 store-ignore-recall se-
quences, the graph log will update. The red line shows
the standard sum-squared-error for the outputs the net-
work produces. The yellow line shows a count of the
number of “negative” rewards given to the network on
recall trials (i.e., trials where the correct output was not
produced). Thus, both of these error measures will be
high at the start, but should decrease as the network's
performance improves.
After continued training, something important hap-
pens — the weights from the S (store) unit in the
Hidden layer to the AC unit get strong enough to reli-
ably activate the AC unit when the store unit is active.
This means that the stimulus information will be reli-
ably stored in active maintenance in the PFC on each
trial, and this will clearly lead to even better perfor-
mance and tuning of both the AC unit and Hidden2
layer weights. This improvement is reflected in the
yellow line of the graph log plot, which rapidly accel-
erates downwards, and should reach zero within 6 or
so epochs. Training will stop automatically after two
epochs of perfect recall trial performance.
Now, let's examine the trained network's perfor-
mance in detail.
, !
Step into the first plus phase.
The AC unit will have a reward value clamped on it
— in this case, the reward will be 0 (actually a very
small number, 1e ￿ 12 , as an actual 0 indicates the ab-
sence of any reward information at all) because it did
not get the correct output. Thus, not much progress was
made on this sequence, but it does provide a concrete
instantiation of the task.
Continue to Step through trials until you see a cou-
ple of cases where the random gating noise causes the
network to update the PFC representations (i.e., you see
the PFC units become activated).
This can happen on any type of trial. This random ex-
ploration of what information to hold on to is an essen-
tial characteristic of this model, and of reinforcement-
based learning mechanisms in general. We will discuss
this aspect of the model later.
Note that after the network receives an external re-
ward signal, it will automatically reset the AC unit, and
this will deactivate any active PFC representations. This
is due to the absorbing reward mechanism discussed in
chapter 6. This resetting mechanism is obviously nec-
essary to allow the network to gate in new information
on subsequent trials. We know from neural recording
studies that PFC active memory representations are de-
activated just after they are needed (e.g., Fuster, 1989;
Goldman-Rakic, 1987), but exactly how this deactiva-
tion takes place at a biological level is not known. This
model would suggest that it might be a reflection of the
absorbing reward mechanism.
The initial phase of training in the network serves
mainly to train up the Hidden2 layer to produce the
correct output on the store and ignore trials. However,
Search WWH ::




Custom Search