Information Technology Reference
In-Depth Information
There are three different qualitative directions of AC
change (increase, no change, and decrease) that have
corresponding effects on the PFC. If the AC unit pre-
dicts that the current inputs will lead to future reward,
it will encode these inputs in the active maintenance
of the PFC. If there is no change in AC activity (i.e.,
no increase or decrease in expected rewards), then the
modulatory gain on the PFC inputs is low, so that the
information maintained in the PFC is undisturbed by
ongoing processing (e.g., the “ignore” stimuli). If there
is a decrease in dopamine (negative change in AC ac-
tivity), then the modulatory gain on the input weights
remains low, and the modulatory gain on the recurrent
weights that implement maintenance in the PFC is also
decreased, so that information in the PFC is deactivated.
As reviewed previously, there is at least suggestive evi-
dence for these kinds of effects of dopamine on the PFC.
The AC-PFC relationship is formalized in the model
with the following equations for the absolute weight
scale terms s in (the weight scaling of the PFC in-
puts) and s maint (the weight-scaling of the PFC self-
maintenance connections):
AC controls when the PFC gets updated by detecting
when a given stimulus is predictive of reward. At the
time of reward, the stimulus is represented in the PFC,
and the AC learns that this PFC representation is predic-
tive of reward. However, at the start of any subsequent
trials involving this rewarded stimulus, the stimulus is
only present in the input and not in the PFC, because the
inactive AC unit needs to be activated to get the stimu-
lus into the PFC. Thus, the PFC needs to be updated
with the current input for the AC to recognize the input
as predictive of reward, but the AC unit needs to recog-
nize the input as predictive of reward for the stimulus to
get into the PFC in the first place!
One possible solution to this problem would be to
ensure that the hidden layer always represents the con-
tents of the PFC (e.g., by having relatively strong top-
down projections). However, this is incompatible with
the need to use the hidden layer for ongoing processing
— in the present task, the hidden layer needs to repre-
sent the intervening “ignore” stimuli between the stor-
age event and the recall event to be able to produce the
correct outputs on these trials.
The solution that we have adopted is to generalize the
same dopaminergic gain modulation mechanism used
for the inputs to the PFC to the outputs from the PFC.
Thus, at the point when reward is received (or antic-
ipated), the PFC will drive the hidden units with suf-
ficient strength that they will reflect the information
stored in the PFC, and the AC will learn that these hid-
den representations are predictive of reward. Then, at
the start of a subsequent storage trial, the hidden units
will trigger the firing of the AC and enable the PFC to
be updated with the appropriate information. To imple-
ment this, we introduce a new dynamic weight-scaling
term, s out , that is updated just as in equation 9.2. Fur-
ther computational motivations for the use of output
gating can be found in
(9.1)
(9.2)
where Æ is the change in AC activation (see equa-
tion 6.5), and ￿ is a random noise value that allows
for random trial-and-error exploration during the ini-
tial phases of learning. The base-level parameters b in
and b maint determine the basic level of each weight-
scaling (gain) parameter, and are typically set to 0 and
1, respectively. Both of the weight-scaling terms are
bounded between 0 and 1.
As discussed in chapter 6, the temporal-difference al-
gorithm that is used to train the AC unit depends criti-
cally on having a representation of prior input informa-
tion at the point at which reward is received. It uses this
information to learn which stimuli occurring earlier in
time reliably predict reward later in time. An important
feature of the present model is that this prior input in-
formation is actually maintained in active memory (and
not via a contrivance like the CSC used in chapter 6), so
it should be available at the point of reward.
However, the fact that this information is maintained
in the PFC causes the following catch-22 problem. The
Hochreiter and Schmidhuber
(1997).
Another possible solution to the catch-22 problem
is to have the PFC always represent the current in-
put in addition to whatever information is being main-
tained. This way, the input-driven PFC representations
can directly trigger the AC system, which then fires
a dopamine signal to “lock in” those representations,
which are thus the same ones that will be active later
Search WWH ::




Custom Search