Combined Model and Task Learning, and Other Mechanisms - Computational Explorations in Cognitive Neuroscience

Information Technology Reference

In-Depth Information

Notice that the weights have increased for the unit

representing the stimulus in its last position just before

it went off (at t =15 ). Thus, the reward caused the AC

unit to go from 0 in the minus phase to .95 in the plus

phase, and this Æ(t =16) updated the weights based on

the sending activations at the previous time step ( t =

Input

odor

light

), just as discussed in the previous section.

We can monitor the Æ(t) values (i.e., the plus-minus

phase difference) for the AC unit as a function of time

step using a graph log.

tone

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

time

Do View and select GRAPH_LOG . Then, Step once

and the graph log should update.

This log clearly shows the blip at t =16 , which goes

back down to 0 as you continue to Step .Thisisbe-

cause we are maintaining the reward active until the end

of the entire sequence (at t =20 ), so there is no change

in the AC unit, and therefore Æ =0 .

Figure 6.22: The reinforcement learning network with CSC

input representation of stimuli by time.

, !

Let's start by examining the network (figure 6.22).

The input layer contains three rows of 20 units each.

This is the CSC, where the rows each represent a differ-

ent stimulus, and the columns represent points in time.

Then, there is a single AC unit that receives weights

from all of these input units.

Now, switch back to act and Step again until you

get to t =15 again on the second pass through.

Recall that the weight for this unit has been increased

— but there is no activation of the AC unit as one might

have expected. This is due to the thresholded nature of

the units.

, !

Click on r.wt and then on the AC unit to see that

the weights start out initialized to zero. Then, click back

to act .

Let's see how the CSC works in action.

, !

To see this, click on net .

You will see that the unit did receive some positive

net input.

, !

Do Step on the control panel.

Nothing should happen, because no stimulus or re-

ward was present at t =0 . However, you can monitor

the time steps from the tick: value displayed at the

bottom of the network view (a tick is one time step in a

sequence of events in the simulator).

, !

Continue to Step until you get to trial 3 (also shown

at the bottom of the network as trial:3 ), time step

, !

Due to accumulating weight changes from the previ-

ous 3 trials, the weight into the AC unit is now strong

enough to activate over threshold. If you look at the

graph log, you will see that there is now a positive Æ(t)

at time step 15.

Thus, the network is now anticipating the reward one

time step earlier. This anticipation has two effects.

t =15

Continue to Step until you see an activation in the

input layer (should be 10 more steps).

This input activation represents the fact that the first

stimulus (i.e., the “tone” stimulus in row 1) came on at

, !

Continue to Step some more.

You will see that this stimulus remains active for 6

more time steps (through t =15 ). Then, notice that just

as the stimulus disappears, the AC unit becomes acti-

vated (at t =16 ). This activation reflects the fact that

a reward was received, and the plus-phase activation of

this unit was clamped to the reward value (.95 here).

Now, let's see what this reward did to the weights.

First, click on r.wt .

You should notice that the weight from the previous

time step (now t = 14 ) is increased as a result of this

positive Æ(t = 15) . These weight changes will even-

tually lead to the reward being anticipated earlier and

earlier.

, !

Now, do one more Step , and observe the graph log.

The second effect is that this anticipation reduced the

magnitude of the Æ ( t = 16) .

, !

Click on r.wt and then on the AC unit.

, !

Computational Explorations in Cognitive Neuroscience

Search WWH ::

Custom Search

Home