Information Technology Reference
In-Depth Information
cannot learn to be sensitive to which inputs are more
task relevant than others (unless this happens to be the
same as the input-output correlations, as in the easy
task). This hard task has a complicated pattern of over-
lap among the different input patterns. For the two
cases where the left output should be on, the middle
two input units are very strongly correlated with the
output activity (conditional probability P (x i jy j )=1 ),
while the outside two inputs are only half-correlated
Question 5.2 (a) Does the network ever solve the task?
(b) Report the final sum_se at the end of training for
each run.
Experiment with the parameters that control the con-
trast enhancement of the CPCA Hebbian learning rule
( wt_gain and wt_off ), to see if these are playing an
important role in the network's behavior.
You should see that changes to these parameters do
not lead to any substantial improvements. Hebbian
learning does not seem to be able to solve tasks where
the correlations do not provide the appropriate weight
values. It seems unlikely that there will generally be a
coincidence between correlational structure and the task
solution. Thus, we must conclude that Hebbian learning
is of limited use for task learning. In contrast, we will
see in the next section that an algorithm specifically de-
signed for task learning can learn this task without much
difficulty.
, !
. The two cases where the left out-
put should be off (and the right one on) overlap consid-
erably with those where it should be on, with the last
event containing both of the highly correlated inputs.
Thus, if the network just pays attention to correlations,
it will tend to respond to this last case when it shouldn't.
Let's see what happens when we run the network on
this task.
Press the Run button in the pat_assoc_ctrl ,
which does a New Init to produce a new set of ran-
dom starting weights, and then does a Run . You should
be viewing the weights of the left output unit in the net-
work window, with the Display turned on so you can
see them being updated as the network learns.
You should see from these weights that the network
has learned that the middle two units are highly corre-
lated with the left output unit, as we expected.
, !
To continue on to the next simulation, you can leave
this project open because we will use it again. Or, if you
wish to stop now, quit by selecting Object/Quit in the
PDP++Root window.
, !
5.3
Using Error to Learn: The Delta Rule
Do TestStep 4times.
You should see that the network is not getting the
right answers. Different runs will produce slightly dif-
ferent results, but the middle two events should turn the
right output unit on, while the first and last either turn on
the left output, or produce weaker activation across both
output units (i.e., they are relatively equally excited).
The weights for the right output unit show that it has
strongly represented its correlation with the second in-
put unit, which explains the pattern of output responses.
This weight for the right output unit is stronger than
those for the left output unit from the two middle in-
puts because of the different overall activity levels in the
different input patterns — this difference in ￿ affects
the renormalization correction for the CPCA Hebbian
learning rule as described earlier (note that even if this
renormalization is set to a constant across the different
events, the network still fails to learn).
In this section we develop a task-based learning algo-
rithm from first principles, and continue to refine this
algorithm in the remainder of this chapter. In the next
chapter, we will compare this new task-based learn-
ing mechanism with Hebbian learning, and provide a
framework for understanding their relative advantages
and disadvantages.
An obvious objective for task learning is to adapt the
weights to produce the correct output pattern for each
input pattern. To do this, we need a measure of how
closely our network is producing the correct outputs,
and then some way of improving this measure by ad-
justing the weights. We can use the summed squared er-
ror (SSE) statistic described previously to measure how
close to correct the network is. First, we will want to
extend this measure to the sum of SSE over all events,
indexed by t , resulting in:
, !
(5.2)
Do several more Run sonthis HARD task.
SSE =
( t
￿ o
, !
Search WWH ::




Custom Search