Information Technology Reference
In-Depth Information
right side of .5. This red line error is plotted for the
training set (AB then AC), and the yellow line shows
the error for testing always on the AB list . Note that the
red and yellow lines start out being roughly correlated
with each other, but are not identical because testing
(yellow line) occurs after each epoch of training, and
so the weights are different than when each item was
presented during the training epoch (red line).
When the red line (training error) gets to zero (or if
50 epochs pass without getting to zero), the network
automatically switches to training on the AC list. Thus,
you will see the red line for the training set jump up im-
mediately for this new set of training events. However,
replicating the McCloskey and Cohen (1989) results
(figure 9.4), you will also see the yellow line for test-
ing the AB list jump up immediately as well, indicating
that learning on the AC list has interfered catastrophi-
cally with the prior learning on the AB list. Let's collect
some statistics by running a batch of training runs.
Tu r n o n t h e Display in the network, do a Run .
Notice how overlapping the distributed hidden unit
representations are — many of the same units are active
across multiple input patterns.
To see this overlap more clearly, view act_avg in
the network display — this shows the average unit acti-
vations across patterns (computed using a running av-
erage). A subset of the units have relatively high (near
50%) activation averages, indicating that they are dis-
proportionately active across patterns.
This overlap seems obviously problematic from an
interference perspective, because the AC list will acti-
vate and reuse the same units from the AB list, alter-
ing their weights to support the C associate instead of
the B . Thus, by reducing the extent to which the hid-
den unit representations overlap (i.e., by making them
sparser ), we might be able to encourage the network to
use separate representations for learning these two lists
of items.
, !
Do View , BATCH_TEXT_LOG to get a log of the re-
sults. Then run a batch of 5 “subjects” with Batch .
The summary average statistics taken at the end of
the AC list training for each “subject” will appear in
the batch text log after the 5 subjects have been run.
avg_sum_se shows the average training error, which
should be 0, and avg_tst_se shows the average test-
ing error on the AB list.
Let's test this idea by reducing the hid_kwta pa-
rameter in the ab_ac_ctrl panel to 4 instead of 12.
This will allow only 4 units to be active at a time in
the hidden layer, which should result in less overlapping
distributed representations.
Clear the graph log and run a Batch with this
reduced hidden layer activity (don't forget to turn the
Display off in the network).
, !
Question 9.2 (a) Report the average testing statistic
( avg_tst_se ) for a batch run of 5 simulated subjects.
(b) How do these results compare to the human data
presented in figure 9.4? (c) Looking at the training
graph log, roughly how many epochs does the network
take to reach its maximum error on the AB list after the
introduction of the AC list?
Question 9.3 (a) Report the resulting average testing
statistic ( avg_tst_se ). (b) Describe any effects that
this manipulation has on the number of epochs it takes
for the network to reach its maximum error on the AB
list after the introduction of the AC list. (c) How do
these results compare to the human data presented in
figure 9.4?
Having replicated the basic catastrophic interference
phenomenon, let's see if we can do anything to reduce
the level of interference. Our strategy will be to retain
the same basic architecture and learning mechanisms
while manipulating certain key parameters. The inten-
tion here is to illuminate some principles that will prove
important for understanding the origin of these interfer-
ence effects, and how they could potentially be reduced
— though we will see that they have relatively small
effects in this particular context.
The network may not have performed as well as ex-
pected because nothing was done to encourage it to use
different sets of 4 units to represent the different asso-
ciates. One way we can encourage this is to increase
the variance of the initial random weights, making each
unit have a more quirky pattern of responses that should
encourage different units to encode the different asso-
ciates.
Thus, change wt_var from .25 to .4.
, !
Search WWH ::




Custom Search