Information Technology Reference
In-Depth Information
Now, click back on r.wt in the network, and let
this process play out by doing Continue on the pro-
cess control panel. Stop the training when the graph
log stops changing.
You will see the anticipation creep forward both in
the weights and in the graph log, ultimately resulting in
activation of the AC unit when the stimulus first comes
on at t =10 . This is the same process that was shown
in figure 6.20, and it represents the heart of the TD al-
gorithm.
At this point, there are many standard phenomena
in classical conditioning that can be explored with this
model. We will look at two: extinction and second order
conditioning. Extinction occurs when the stimulus is no
longer predictive of reward — it then loses its ability to
predict this reward (which is appropriate). Second order
conditioning, as we discussed earlier, is where a condi-
tioned stimulus can serve as the unconditioned stimulus
for another stimulus — in other words, one can extend
the prediction of reward backward across two separate
stimuli.
We can simulate extinction by simply turning off the
reward that appears at t = 16 . To do this, we need
to alter the parameters on the control panel that de-
termine the nature of the stimulus input and reward.
First, to familiarize yourself with the controls, look at
the stim_1 field — this controls the timing of the first
stimulus, with t_on representing the time at which the
stimulus comes on, and len being how long it stays
on. The two var fields provide for variance around
these points, which has been zero (you can explore these
on your own later). The timing parameters for reward
are in the us (unconditioned stimulus) field. Although
these fields determine the timing of the stimulus, an-
other mechanism is used to control their probability of
coming on at all. These probabilities are what we want
to manipulate. The master control for these probabili-
ties is contained in the probs field, but we will use a
shortcut through the StdProbs button.
Question 6.5 (a) What happened at the point where
the reward was supposed to occur? (b) Explain why
this happened using the TD equations. (c) Then,
Continue the network and describe what occurs next
in terms of the TD error signals plotted in the graph log,
and explain why TD does this. (d) After the network is
done learning again, does the stimulus still evoke an
expectation of reward?
, !
One thing you might have noticed is that during this
extinction procedure, the weights are not reduced back
to zero. Indeed, they are reduced only enough to bring
the AC unit below threshold. The effects of this thresh-
old may not be applicable to the real brain because it
appears that the AC unit is constantly active at a low
level, so either some additional inputs are driving it or
the resting potential and threshold are effectively much
closer than in this simulation. Thus, we might expect
that the weights would have to be reduced much more
to bring the AC unit below threshold. However, if the
behavior did suggest that extinction was not complete
(as it does in at least some situations), then this kind of
threshold effect may be at work.
Now, let's explore second order conditioning. We
must first retrain the network on the stimulus 1 asso-
ciation.
Press StdProbs and select STIM1_US , and then
NewRun (which re-initializes the network) until the on-
set of the stimulus is clearly driving the expectation of
reward.
Now, we will turn on the second stimulus, which
starts at t = 2 and lasts for 8 time steps (as you can
see from the stim_2 field in the control panel).
, !
Do this by selecting StdProbs and STIM1_2_US .
Go back to viewing act if you aren't already, and Step
through the trial. Then, go back and look at the weights.
Essentially, the first stimulus acts just like a reward
by triggering a positive Æ(t) , and thus allows the second
stimulus to learn to predict this first stimulus.
, !
Press
the StdProbs button,
and
select
Push Continue , and then Stop when the graph
log stops changing.
You will see that the early anticipation of reward gets
carried out to the onset of the second stimulus (which
comes first in time).
STIM1_NO_US .
This indicates that stimulus 1 will be presented but
no US.
, !
, !
Now, Clear the graph log and Step through a trial.
, !
Search WWH ::




Custom Search