Combined Model and Task Learning, and Other Mechanisms - Computational Explorations in Cognitive Neuroscience

Information Technology Reference

In-Depth Information

This is the minus phase for the beginning of a se-

quence (one pass through the FSA grammar), which al-

ways starts with the letter B, and the context units ze-

roed. The network will produce some random expecta-

tion of which letters are coming next. Note that there

is some noise in the unit activations — this helps them

pick one unit out of the two possible ones at random.

of them. This situation will be important later as we

consider how networks can efficiently represent multi-

ple items (see chapter 7 for further discussion).

To monitor the network's performance over learning,

we need an error statistic that converges to zero when

the network has learned the task perfectly (which is not

the case with the standard SSE, due to the randomness

of the task). Thus, we have a new statistic that reports

an error (of 1) if the output unit was not one of the two

possible outputs (i.e., as shown in the Targets layer).

This is labeled as sum_fsa_err in the log displays.

Then, Step again to see the plus phase.

You should see that one of the two possible subse-

quent letters (T or P) is strongly activated — this unit in-

dicates which letter actually came next in the sequence.

Thus, the network only ever learns about one of the two

possible subsequent letters on each trial (because they

are chosen at random). It has to learn that a given node

has two possible outputs by integrating experience over

different trials, which is one of the things that makes

this a somewhat challenging task to learn.

An interesting aspect of this task is that even when

the network has done as well as it possibly could, it

should still make roughly 50 percent “errors,” because

it ends up making a discrete guess as to which output

will come next, which can only be right 50 percent of

the time. This could cause problems for learning if it in-

troduced a systematic error signal that would constantly

increase or decrease the bias weights. This is not a prob-

lem because a unit will be correctly active about as of-

ten as it will be incorrectly inactive, so the overall net

error will be zero. Note that if we allowed both units

to become active this would not be the case, because

one of the units would always be incorrectly active, and

this would introduce a net negative error and large neg-

ative bias weights (which would eventually shut down

the activation of the output units).

One possible objection to having the network pick

one output at random instead of allowing both to be

on, is that it somehow means that the network will be

“surprised” by the actual response when it differs from

the guess (i.e., about 50% of the time). This is actu-

ally not the case, because the hidden layer representa-

tion remains essentially the same for both outputs (re-

flecting the node identity, more or less), and thus does

not change when the actual output is presented in the

plus phase. Thus, the “higher level” internal repre-

sentation encompasses both possible outputs, while the

lower-level output representation randomly chooses one

, !

Now, continue to Step into the minus phase of the

next event in the sequence.

You should see now that the Context units are up-

dated with a copy of the prior hidden unit activations.

, !

To verify this, click on act_p .

This will show the plus phase activations from the

previous event.

, !

Now you can continue to Step through the rest of

the sequence. We can open up a training graph log by

doing View , TRAIN_GRAPH_LOG , and then we can Run .

As the network runs, a special type of environment

(called a ScriptEnv ) dynamically creates 25 new se-

quences of events every other epoch (to speed the com-

putation, because the script is relatively slow). Thus,

instead of creating a whole bunch of training examples

from the underlying FSA in advance, they are created

on-line with a script that implements the Reber gram-

mar FSA.

Because it takes a while to train, you can opt to load

a fully trained network and its training log.

, !

To d o s o, Stop the network at any time. To

load the network, do Object/Load in the network

window, and select fsa.trained.net.gz . To load

the log file, go to the Epoch_0_GraphLog , and do

LogFile/Load File and select fsa.epc.log .

The network should take anywhere between 13 and

80 epochs to learn the problem to the point where it

gets zero errors in one epoch (this was the range for

ten random networks we ran). The pre-trained network

took 15 epochs to get to this first zero, but we trained

it longer (54 epochs total) to get it to the point where it

got 4 zeros in a row. This stamping in of the representa-

tions makes them more robust to the noise, but the net-

work still makes occasional errors even with this extra

, !

Computational Explorations in Cognitive Neuroscience

Search WWH ::

Custom Search

Home