Memory - Computational Explorations in Cognitive Neuroscience

Information Technology Reference

In-Depth Information

right side of .5. This red line error is plotted for the

training set (AB then AC), and the yellow line shows

the error for testing always on the AB list . Note that the

red and yellow lines start out being roughly correlated

with each other, but are not identical because testing

(yellow line) occurs after each epoch of training, and

so the weights are different than when each item was

presented during the training epoch (red line).

When the red line (training error) gets to zero (or if

50 epochs pass without getting to zero), the network

automatically switches to training on the AC list. Thus,

you will see the red line for the training set jump up im-

mediately for this new set of training events. However,

replicating the McCloskey and Cohen (1989) results

(figure 9.4), you will also see the yellow line for test-

ing the AB list jump up immediately as well, indicating

that learning on the AC list has interfered catastrophi-

cally with the prior learning on the AB list. Let's collect

some statistics by running a batch of training runs.

Tu r n o n t h e Display in the network, do a Run .

Notice how overlapping the distributed hidden unit

representations are — many of the same units are active

across multiple input patterns.

To see this overlap more clearly, view act_avg in

the network display — this shows the average unit acti-

vations across patterns (computed using a running av-

erage). A subset of the units have relatively high (near

50%) activation averages, indicating that they are dis-

proportionately active across patterns.

This overlap seems obviously problematic from an

interference perspective, because the AC list will acti-

vate and reuse the same units from the AB list, alter-

ing their weights to support the C associate instead of

the B . Thus, by reducing the extent to which the hid-

den unit representations overlap (i.e., by making them

sparser ), we might be able to encourage the network to

use separate representations for learning these two lists

of items.

, !

Do View , BATCH_TEXT_LOG to get a log of the re-

sults. Then run a batch of 5 “subjects” with Batch .

The summary average statistics taken at the end of

the AC list training for each “subject” will appear in

the batch text log after the 5 subjects have been run.

avg_sum_se shows the average training error, which

should be 0, and avg_tst_se shows the average test-

ing error on the AB list.

Let's test this idea by reducing the hid_kwta pa-

rameter in the ab_ac_ctrl panel to 4 instead of 12.

This will allow only 4 units to be active at a time in

the hidden layer, which should result in less overlapping

distributed representations.

Clear the graph log and run a Batch with this

reduced hidden layer activity (don't forget to turn the

Display off in the network).

, !

Question 9.2 (a) Report the average testing statistic

( avg_tst_se ) for a batch run of 5 simulated subjects.

(b) How do these results compare to the human data

presented in figure 9.4? (c) Looking at the training

graph log, roughly how many epochs does the network

take to reach its maximum error on the AB list after the

introduction of the AC list?

Question 9.3 (a) Report the resulting average testing

statistic ( avg_tst_se ). (b) Describe any effects that

this manipulation has on the number of epochs it takes

for the network to reach its maximum error on the AB

list after the introduction of the AC list. (c) How do

these results compare to the human data presented in

figure 9.4?

Having replicated the basic catastrophic interference

phenomenon, let's see if we can do anything to reduce

the level of interference. Our strategy will be to retain

the same basic architecture and learning mechanisms

while manipulating certain key parameters. The inten-

tion here is to illuminate some principles that will prove

important for understanding the origin of these interfer-

ence effects, and how they could potentially be reduced

— though we will see that they have relatively small

effects in this particular context.

The network may not have performed as well as ex-

pected because nothing was done to encourage it to use

different sets of 4 units to represent the different asso-

ciates. One way we can encourage this is to increase

the variance of the initial random weights, making each

unit have a more quirky pattern of responses that should

encourage different units to encode the different asso-

ciates.

Thus, change wt_var from .25 to .4.

, !

Computational Explorations in Cognitive Neuroscience

Search WWH ::

Custom Search

Home