Error-Driven Task Learning - Computational Explorations in Cognitive Neuroscience

Information Technology Reference

In-Depth Information

with hidden layers between the input and output layers,

resulting in an algorithm commonly called backprop-

agation that can learn even the “impossible” shown in

figure 5.7. Indeed, with enough hidden units, this algo-

rithm can learn any function that uniquely maps input

patterns to output patterns. Thus, backpropagation is a

truly powerful mechanism for task-based learning.

In chapter 3, we discussed the advantages of hidden

units in mediating transformations of the input patterns.

These transformations emphasize some aspects of the

input patterns and deemphasize others, and we argued

that by chaining together multiple stages of such trans-

formations, one could achieve more powerful and com-

plex mappings between input patterns and output pat-

terns. Now, we are in a position to formalize some of

these ideas by showing how a backpropagation network

with hidden units can transform the input patterns in the

IMPOSSIBLE pattern associator task, so that a subse-

quent stage of transformation from the hidden layer to

the output layer can produce the desired mapping.

Another way to think about these transformations is

in terms of re-representing the problem. Students of

psychology may be familiar with insight problems that

require one to re-represent problems to find their then

relatively easy solutions. For example, try to figure

out how the following statement makes sense: “Aman

wants to go home, but he can't because a man with a

mask is there.” The words here have been chosen so

that, out of context, they don't really make sense. How-

ever, if you are able to re-represent these words in the

context of “baseball,” then you will see that it all makes

perfect sense.

Another example comes from simple tool use — a

stick can be turned into a club by re-representing it as

something other than a part of a tree. Indeed, much of

cognition is based on the development of appropriate

re-representations (transformations) of input patterns.

From a computational perspective, computer scientists

have long appreciated that selecting an appropriate rep-

resentation is perhaps the most important step in design-

ing an algorithm. Thus, the importance of developing

learning mechanisms like backpropagation that can pro-

duce such re-representations in a series of hidden layers

can hardly be overstated. We will return to this issue

repeatedly throughout the text.

Set env_type to IMPOSSIBLE . Then do View ,

EVENTS .

Notice that each input unit in this environment (fig-

ure 5.7) is active equally often when the output is

active as when it is inactive. These kinds of prob-

lems are called ambiguous cue problems, or nonlinear

discrimination problems (Sutherland & Rudy, 1989;

O'Reilly & Rudy, in press). This kind of problem might

prove difficult, because every input unit will end up be-

ing equivocal about what the output should do. Never-

theless, the input patterns are not all the same — people

could learn to solve this task fairly trivially by just pay-

ing attention to the overall patterns of activation. Let's

see if the network can do this.

, !

Press Run on the general control panel.

Do it again. And again. Any luck?

Because the delta rule cannot learn what appears to

be a relatively simple task, we conclude that something

more powerful is necessary. Unfortunately, that is not

the conclusion that Minsky and Papert (1969) reached

in their highly influential topic, Perceptrons . Instead,

they concluded that neural networks were hopelessly

inadequate because they could not solve problems like

the one we just explored (specifically, they focused on

the exclusive-or (XOR) task)! This conclusion played

a large role in the waning of the early interest in neural

network models of the 1960s. Interestingly, we will see

that only a few more applications of the chain rule are

necessary to remedy the problem, but this fact took a

while to be appreciated by most people (roughly fifteen

years, in fact).

, !

Go to the PDP++Root window. To continue on to

the next simulation, close this project first by selecting

.projects/Remove/Project_0 . Or, if you wish to

stop now, quit by selecting Object/Quit .

, !

5.6

The Generalized Delta Rule: Backpropagation

As we saw, the delta rule, though much better than Heb-

bian learning for task-based learning, also has its limits.

What took people roughly fifteen years to really appre-

ciate was that this limitation only applies to networks

with only two layers (an input and output layer, as in the

above pattern associator models). The delta rule can be

relatively directly extended or generalized for networks

Computational Explorations in Cognitive Neuroscience

Search WWH ::

Custom Search

Home