Information Technology Reference
In-Depth Information
Now, click back on
r.wt
in the network, and let
this process play out by doing
Continue
on the pro-
cess control panel.
Stop
the training when the graph
log stops changing.
You will see the anticipation creep forward both in
the weights and in the graph log, ultimately resulting in
activation of the AC unit when the stimulus first comes
on at
t =10
. This is the same process that was shown
in figure 6.20, and it represents the heart of the TD al-
gorithm.
At this point, there are many standard phenomena
in classical conditioning that can be explored with this
model. We will look at two:
extinction
and
second order
conditioning. Extinction occurs when the stimulus is no
longer predictive of reward — it then loses its ability to
predict this reward (which is appropriate). Second order
conditioning, as we discussed earlier, is where a condi-
tioned stimulus can serve as the unconditioned stimulus
for another stimulus — in other words, one can extend
the prediction of reward backward across two separate
stimuli.
We can simulate extinction by simply turning off the
reward that appears at
t
= 16
. To do this, we need
to alter the parameters on the control panel that de-
termine the nature of the stimulus input and reward.
First, to familiarize yourself with the controls, look at
the
stim_1
field — this controls the timing of the first
stimulus, with
t_on
representing the time at which the
stimulus comes on, and
len
being how long it stays
on. The two
var
fields provide for variance around
these points, which has been zero (you can explore these
on your own later). The timing parameters for reward
are in the
us
(unconditioned stimulus) field. Although
these fields determine the timing of the stimulus, an-
other mechanism is used to control their probability of
coming on at all. These probabilities are what we want
to manipulate. The master control for these probabili-
ties is contained in the
probs
field, but we will use a
shortcut through the
StdProbs
button.
Question 6.5 (a)
What happened at the point where
the reward was supposed to occur?
(b)
Explain why
this happened using the TD equations.
(c)
Then,
Continue
the network and describe what occurs next
in terms of the TD error signals plotted in the graph log,
and explain why TD does this.
(d)
After the network is
done learning again, does the stimulus still evoke an
expectation of reward?
,
!
One thing you might have noticed is that during this
extinction procedure, the weights are not reduced back
to zero. Indeed, they are reduced only enough to bring
the AC unit below threshold. The effects of this thresh-
old may not be applicable to the real brain because it
appears that the AC unit is constantly active at a low
level, so either some additional inputs are driving it or
the resting potential and threshold are effectively much
closer than in this simulation. Thus, we might expect
that the weights would have to be reduced much more
to bring the AC unit below threshold. However, if the
behavior did suggest that extinction was not complete
(as it does in at least some situations), then this kind of
threshold effect may be at work.
Now, let's explore second order conditioning. We
must first retrain the network on the stimulus 1 asso-
ciation.
Press
StdProbs
and select
STIM1_US
, and then
NewRun
(which re-initializes the network) until the on-
set of the stimulus is clearly driving the expectation of
reward.
Now, we will turn on the second stimulus, which
starts at
t
= 2
and lasts for 8 time steps (as you can
see from the
stim_2
field in the control panel).
,
!
Do this by selecting
StdProbs
and
STIM1_2_US
.
Go back to viewing
act
if you aren't already, and
Step
through the trial. Then, go back and look at the weights.
Essentially, the first stimulus
acts just like a reward
by triggering a positive
Æ(t)
, and thus allows the second
stimulus to learn to predict this first stimulus.
,
!
Press
the
StdProbs
button,
and
select
Push
Continue
, and then
Stop
when the graph
log stops changing.
You will see that the early anticipation of reward gets
carried out to the onset of the second stimulus (which
comes first in time).
STIM1_NO_US
.
This indicates that stimulus 1 will be presented but
no US.
,
!
,
!
Now,
Clear
the graph log and
Step
through a trial.
,
!
Search WWH ::
Custom Search