Information Technology Reference
In-Depth Information
with hidden layers between the input and output layers,
resulting in an algorithm commonly called backprop-
agation that can learn even the “impossible” shown in
figure 5.7. Indeed, with enough hidden units, this algo-
rithm can learn any function that uniquely maps input
patterns to output patterns. Thus, backpropagation is a
truly powerful mechanism for task-based learning.
In chapter 3, we discussed the advantages of hidden
units in mediating transformations of the input patterns.
These transformations emphasize some aspects of the
input patterns and deemphasize others, and we argued
that by chaining together multiple stages of such trans-
formations, one could achieve more powerful and com-
plex mappings between input patterns and output pat-
terns. Now, we are in a position to formalize some of
these ideas by showing how a backpropagation network
with hidden units can transform the input patterns in the
IMPOSSIBLE pattern associator task, so that a subse-
quent stage of transformation from the hidden layer to
the output layer can produce the desired mapping.
Another way to think about these transformations is
in terms of re-representing the problem. Students of
psychology may be familiar with insight problems that
require one to re-represent problems to find their then
relatively easy solutions. For example, try to figure
out how the following statement makes sense: “Aman
wants to go home, but he can't because a man with a
mask is there.” The words here have been chosen so
that, out of context, they don't really make sense. How-
ever, if you are able to re-represent these words in the
context of “baseball,” then you will see that it all makes
perfect sense.
Another example comes from simple tool use — a
stick can be turned into a club by re-representing it as
something other than a part of a tree. Indeed, much of
cognition is based on the development of appropriate
re-representations (transformations) of input patterns.
From a computational perspective, computer scientists
have long appreciated that selecting an appropriate rep-
resentation is perhaps the most important step in design-
ing an algorithm. Thus, the importance of developing
learning mechanisms like backpropagation that can pro-
duce such re-representations in a series of hidden layers
can hardly be overstated. We will return to this issue
repeatedly throughout the text.
Set env_type to IMPOSSIBLE . Then do View ,
EVENTS .
Notice that each input unit in this environment (fig-
ure 5.7) is active equally often when the output is
active as when it is inactive. These kinds of prob-
lems are called ambiguous cue problems, or nonlinear
discrimination problems (Sutherland & Rudy, 1989;
O'Reilly & Rudy, in press). This kind of problem might
prove difficult, because every input unit will end up be-
ing equivocal about what the output should do. Never-
theless, the input patterns are not all the same — people
could learn to solve this task fairly trivially by just pay-
ing attention to the overall patterns of activation. Let's
see if the network can do this.
, !
Press Run on the general control panel.
Do it again. And again. Any luck?
Because the delta rule cannot learn what appears to
be a relatively simple task, we conclude that something
more powerful is necessary. Unfortunately, that is not
the conclusion that Minsky and Papert (1969) reached
in their highly influential topic, Perceptrons . Instead,
they concluded that neural networks were hopelessly
inadequate because they could not solve problems like
the one we just explored (specifically, they focused on
the exclusive-or (XOR) task)! This conclusion played
a large role in the waning of the early interest in neural
network models of the 1960s. Interestingly, we will see
that only a few more applications of the chain rule are
necessary to remedy the problem, but this fact took a
while to be appreciated by most people (roughly fifteen
years, in fact).
, !
Go to the PDP++Root window. To continue on to
the next simulation, close this project first by selecting
.projects/Remove/Project_0 . Or, if you wish to
stop now, quit by selecting Object/Quit .
, !
5.6
The Generalized Delta Rule: Backpropagation
As we saw, the delta rule, though much better than Heb-
bian learning for task-based learning, also has its limits.
What took people roughly fifteen years to really appre-
ciate was that this limitation only applies to networks
with only two layers (an input and output layer, as in the
above pattern associator models). The delta rule can be
relatively directly extended or generalized for networks
Search WWH ::




Custom Search