Information Technology Reference
In-Depth Information
lrn
field, which shows the learning rate for the
weights (
lrate
, always .01) and for the bias weights
(
bias_lrate
, which is 0 for Hebbian learning be-
cause it has no way of training the bias weights, and is
equal to
lrate
for delta rule), and the proportion of
Hebbian learning (
hebb
,1or0— we will see in the
next chapter that intermediate values of this parameter
can be used as well).
Before training the network, we will explore how the
minus-plus activation phases work in the simulator.
Event_2
Event_3
Event_0
Event_1
Figure 5.7:
The impossible pattern associator task mapping,
where there is complete overlap among the patterns that acti-
vate the different output units.
Make sure that you are monitoring
act
ivations in
the network, and set
step_level
to
STEP_SETTLE
in-
stead of
STEP_TRIAL
in the control panel.
This will increase the resolution of the stepping so
that each press of the
Step
button will perform the set-
tling (iterative activation updating) process associated
with each phase of processing.
,
!
This variability in the weights reflects a critical weak-
ness of error-driven learning — it's
lazy
. Basically, once
the output unit is performing the task correctly, learn-
ing effectively stops, with whatever weight values that
happened to do the trick. In contrast, Hebbian learn-
ing keeps adapting the weights to reflect the conditional
probabilities, which, in this task, results in roughly the
same final weight values regardless of what the initial
random weights were. We will return to this issue later,
when we discuss the benefits of using a combination of
Hebbian and error-driven learning.
Now for the real test.
Next hit the
Step
button.
You will see in the network the actual activation pro-
duced in response to the input pattern (also known as the
expectation,
or
response
,or
minus phase
activation).
,
!
Now, hit
Step
again.
You will see the target (also known as the
outcome,
or
instruction,
or
plus phase
) activation. Learning oc-
curs after this second, plus phase of activation. You can
recognize targets because their activations are exactly
.95 or 0 — note that we are clamping activations to .95
and 0 because units cannot easily produce activations
above .95 with typical net input values due to the satu-
rating nonlinearity of the rate code activation function.
You can also switch to viewing the
targ
in the net-
work, which will show you the target inputs prior to the
activation clamping. In addition, the minus phase acti-
vation is always viewable as
act_m
and the plus phase
as
act_p
.
Now, let's monitor the weights.
,
!
Set
env_type
to
HARD
. Then, press
Run
.
You should see that the network learns this task with-
out apparently much difficulty. Thus, because the delta
rule performs learning as a function of how well the net-
work is actually doing, it can adapt the weights specifi-
cally to solve the task.
,
!
Question 5.3 (a)
Compare and contrast in a qualita-
tive manner the nature of the weights learned by the
delta rule on this
HARD
task with those learned by the
Hebbian rule (e.g., note where the largest weights tend
to be) — be sure to do multiple runs to get a general
sense of what tends to be learned.
(b)
Using your an-
swer to the first part, explain why the delta rule weights
solve the problem, but the Hebbian ones do not (don't
forget to include the bias weights
bias.wt
in your
analysis of the delta rule case).
Click on
r.wt
, and then on the left output unit. Then
Run
the process control panel to complete the training
on this
EASY
task.
The network has no trouble learning this task. How-
ever, if you perform multiple
Run
's, you might no-
tice that the final weight values are quite variable rel-
ative to the Hebbian case (you can always switch the
LearnRule
back to
HEBB
in the control panel to com-
pare between the two learning algorithms).
,
!
After this experience, you may think that the delta
rule is all powerful, but we can temper this enthusiasm
and motivate the next section.
Search WWH ::
Custom Search