Information Technology Reference
In-Depth Information
the value for that unit. Values of exactly 0 indicate no
reward information at all, not the absence of reward.)
Press act in the network window to view the plus
phase activations, where you can see the active PFC
units. Press StepSettle several times, observing the
AC unit activation in the minus and plus phases. Stop
when you see the other dimension getting activated in
the input — the network is moving on to the second train-
ing block at this point.
You should notice for the next several trials that the
network continues to perform correctly, so that the AC
unit is activated in the plus phase, but it is not activated
in the minus phase because that depends on learning to
predict or expect the reward. This learning is taking
place, and early in the next block it will come to expect
this reward in the minus phase.
At this point, the network is responding correctly to
the location of the first feature, and so has completed
the first training block. However, you may have noticed
that there is not necessarily a clear correspondence in
the feature-level PFC to this target stimulus for the ini-
tial training block. This is because the second stimu-
lus is actually perfectly anticorrelated with the first one
in location, so the network could just as easily learn
“press in the opposite direction of the second stimulus.”
Monkeys (and people) probably have biases to learn the
more direct “press in the same direction as the first stim-
ulus,” but this bias is not captured in the model. Never-
theless, this bias turns out not to be critical, as we shall
see.
To monitor the performance of the network on this
task, it is useful to check the Epoch_0_GraphLog ,
which contains a plot of the error over epochs. You can
see that the network error has descended to zero during
this first training block, and completed one additional
epoch with zero error, indicating that it is time to move
on to the next block.
We next present the second training block, where the
features from the other dimension ( B ) are included, but
the target remains the same (the first feature in dimen-
sion A ). The network has no difficulty with this prob-
lem. It should complete the block to criterion after the
minimum of 2 epochs. During this time, the AC unit
will come to strongly expect the reward signal in the
minus phase.
, !
Press StepSettle again to get to the second plus
phase. Note that the network window displays all the
counter information, so you can more easily keep track
of where you are — phase_no should be 2 now.
Recall from chapter 9 that this second plus phase is
viewed as part of one overall plus phase, but gets sep-
arated out for ease of determining first the change in
the AC unit activation over the initial minus-plus phase
set, and then the consequences for the PFC. You should
see that in this case the PFC units are not updated, be-
cause the temporal differences TD signal across the AC
unit minus and plus phases was zero, meaning that the
hidden-to-PFC weights were not transiently strength-
ened. Note that there is noise added on top of this
basic TD signal, but, because we are using the same
set of random numbers (via the ReInit function), we
know that this noise was negative and did not activate
the PFC.
You can see the actual TD values used for gating the
PFC units in the PFC_td values displayed near the
respective PFC layers (both should be negative). Al-
though random noise in the gating signal can activate
the PFC units by chance, it turns out that on this run,
the units will only get activated when the network pro-
duces the correct output.
, !
Press StepTrial (which steps through all three
phases of one event) a few times, until you see the PFC
units get activated.
When the network produces the correct output, the
AC unit then receives a positive reward value in the first
plus phase. To see that the network did produce the
correct output in the minus phase, we can look at the
minus phase activations.
, !
Press act_m in the network window to view the
minus phase activations.
You should see that the right output unit is active
above .5. The reward for producing the correct out-
put that is provided to the AC unit produces a pos-
itive TD signal, which modulates the strength of the
hidden-to-PFC weights by a large amount as shown in
the PFC_td values, which should both be around 1.
This large input strength causes the PFC units to now
represent the most strongly active hidden units.
, !
StepTrial through the second training block, stop-
ping when the patterns in the input change (you can
, !
Search WWH ::




Custom Search