Combined Model and Task Learning, and Other Mechanisms - Computational Explorations in Cognitive Neuroscience

Information Technology Reference

In-Depth Information

Now, click back on r.wt in the network, and let

this process play out by doing Continue on the pro-

cess control panel. Stop the training when the graph

log stops changing.

You will see the anticipation creep forward both in

the weights and in the graph log, ultimately resulting in

activation of the AC unit when the stimulus first comes

on at t =10 . This is the same process that was shown

in figure 6.20, and it represents the heart of the TD al-

gorithm.

At this point, there are many standard phenomena

in classical conditioning that can be explored with this

model. We will look at two: extinction and second order

conditioning. Extinction occurs when the stimulus is no

longer predictive of reward — it then loses its ability to

predict this reward (which is appropriate). Second order

conditioning, as we discussed earlier, is where a condi-

tioned stimulus can serve as the unconditioned stimulus

for another stimulus — in other words, one can extend

the prediction of reward backward across two separate

stimuli.

We can simulate extinction by simply turning off the

reward that appears at t = 16 . To do this, we need

to alter the parameters on the control panel that de-

termine the nature of the stimulus input and reward.

First, to familiarize yourself with the controls, look at

the stim_1 field — this controls the timing of the first

stimulus, with t_on representing the time at which the

stimulus comes on, and len being how long it stays

on. The two var fields provide for variance around

these points, which has been zero (you can explore these

on your own later). The timing parameters for reward

are in the us (unconditioned stimulus) field. Although

these fields determine the timing of the stimulus, an-

other mechanism is used to control their probability of

coming on at all. These probabilities are what we want

to manipulate. The master control for these probabili-

ties is contained in the probs field, but we will use a

shortcut through the StdProbs button.

Question 6.5 (a) What happened at the point where

the reward was supposed to occur? (b) Explain why

this happened using the TD equations. (c) Then,

Continue the network and describe what occurs next

in terms of the TD error signals plotted in the graph log,

and explain why TD does this. (d) After the network is

done learning again, does the stimulus still evoke an

expectation of reward?

, !

One thing you might have noticed is that during this

extinction procedure, the weights are not reduced back

to zero. Indeed, they are reduced only enough to bring

the AC unit below threshold. The effects of this thresh-

old may not be applicable to the real brain because it

appears that the AC unit is constantly active at a low

level, so either some additional inputs are driving it or

the resting potential and threshold are effectively much

closer than in this simulation. Thus, we might expect

that the weights would have to be reduced much more

to bring the AC unit below threshold. However, if the

behavior did suggest that extinction was not complete

(as it does in at least some situations), then this kind of

threshold effect may be at work.

Now, let's explore second order conditioning. We

must first retrain the network on the stimulus 1 asso-

ciation.

Press StdProbs and select STIM1_US , and then

NewRun (which re-initializes the network) until the on-

set of the stimulus is clearly driving the expectation of

reward.

Now, we will turn on the second stimulus, which

starts at t = 2 and lasts for 8 time steps (as you can

see from the stim_2 field in the control panel).

, !

Do this by selecting StdProbs and STIM1_2_US .

Go back to viewing act if you aren't already, and Step

through the trial. Then, go back and look at the weights.

Essentially, the first stimulus acts just like a reward

by triggering a positive Æ(t) , and thus allows the second

stimulus to learn to predict this first stimulus.

, !

Press

the StdProbs button,

and

select

Push Continue , and then Stop when the graph

log stops changing.

You will see that the early anticipation of reward gets

carried out to the onset of the second stimulus (which

comes first in time).

STIM1_NO_US .

This indicates that stimulus 1 will be presented but

no US.

, !

Now, Clear the graph log and Step through a trial.

, !

Computational Explorations in Cognitive Neuroscience

Search WWH ::

Custom Search

Home