Information Technology Reference
In-Depth Information
lrn field, which shows the learning rate for the
weights ( lrate , always .01) and for the bias weights
( bias_lrate , which is 0 for Hebbian learning be-
cause it has no way of training the bias weights, and is
equal to lrate for delta rule), and the proportion of
Hebbian learning ( hebb ,1or0— we will see in the
next chapter that intermediate values of this parameter
can be used as well).
Before training the network, we will explore how the
minus-plus activation phases work in the simulator.
Event_2
Event_3
Event_0
Event_1
Figure 5.7: The impossible pattern associator task mapping,
where there is complete overlap among the patterns that acti-
vate the different output units.
Make sure that you are monitoring act ivations in
the network, and set step_level to STEP_SETTLE in-
stead of STEP_TRIAL in the control panel.
This will increase the resolution of the stepping so
that each press of the Step button will perform the set-
tling (iterative activation updating) process associated
with each phase of processing.
, !
This variability in the weights reflects a critical weak-
ness of error-driven learning — it's lazy . Basically, once
the output unit is performing the task correctly, learn-
ing effectively stops, with whatever weight values that
happened to do the trick. In contrast, Hebbian learn-
ing keeps adapting the weights to reflect the conditional
probabilities, which, in this task, results in roughly the
same final weight values regardless of what the initial
random weights were. We will return to this issue later,
when we discuss the benefits of using a combination of
Hebbian and error-driven learning.
Now for the real test.
Next hit the Step button.
You will see in the network the actual activation pro-
duced in response to the input pattern (also known as the
expectation, or response ,or minus phase activation).
, !
Now, hit Step again.
You will see the target (also known as the outcome,
or instruction, or plus phase ) activation. Learning oc-
curs after this second, plus phase of activation. You can
recognize targets because their activations are exactly
.95 or 0 — note that we are clamping activations to .95
and 0 because units cannot easily produce activations
above .95 with typical net input values due to the satu-
rating nonlinearity of the rate code activation function.
You can also switch to viewing the targ in the net-
work, which will show you the target inputs prior to the
activation clamping. In addition, the minus phase acti-
vation is always viewable as act_m and the plus phase
as act_p .
Now, let's monitor the weights.
, !
Set env_type to HARD . Then, press Run .
You should see that the network learns this task with-
out apparently much difficulty. Thus, because the delta
rule performs learning as a function of how well the net-
work is actually doing, it can adapt the weights specifi-
cally to solve the task.
, !
Question 5.3 (a) Compare and contrast in a qualita-
tive manner the nature of the weights learned by the
delta rule on this HARD task with those learned by the
Hebbian rule (e.g., note where the largest weights tend
to be) — be sure to do multiple runs to get a general
sense of what tends to be learned. (b) Using your an-
swer to the first part, explain why the delta rule weights
solve the problem, but the Hebbian ones do not (don't
forget to include the bias weights bias.wt in your
analysis of the delta rule case).
Click on r.wt , and then on the left output unit. Then
Run the process control panel to complete the training
on this EASY task.
The network has no trouble learning this task. How-
ever, if you perform multiple Run 's, you might no-
tice that the final weight values are quite variable rel-
ative to the Hebbian case (you can always switch the
LearnRule back to HEBB in the control panel to com-
pare between the two learning algorithms).
, !
After this experience, you may think that the delta
rule is all powerful, but we can temper this enthusiasm
and motivate the next section.
Search WWH ::




Custom Search