Information Technology Reference
In-Depth Information
Σ k
to read. Furthermore, we assume that each time step
corresponds to a different pattern of activity over the
inputs.
Let's assume that the weights into the receiving unit
learn on each time step (input pattern) t according to a
very simple Hebbian learning rule (formula for chang-
ing the weights) where the weight change for that pat-
tern ( ￿ t w ij , denoted dwt in the simulator) depends
associatively on the activities of both presynaptic and
postsynaptic units as follows:
y
=
x
k w kj
j
Σ k
<x x >
< t
w ij =
w kj
i
k
t
w ij
<x x >
= C ik
i k
t
. . .
x
x
i
k
Figure 4.6: Schematic for how the correlations are computed
via the simple Hebbian learning algorithm.
which is just a linear function of the activities of all the
other input units, we find that, after a little bit of algebra,
the weight changes are a function of the correlations
between the input units:
(4.2)
where ￿ is the learning rate parameter ( lrate in the
simulator) and i is the index of a particular input unit.
The learning rate ￿ is an arbitrary constant that deter-
mines how rapidly the weights are updated as a function
of each experience — we will see in chapter 9 that this is
an important parameter for understanding the nature of
memory. The weight change expression in equation 4.2
(and all the others developed subsequently) is used to
update the weights using the following weight update
equation:
￿ w
= h x
(4.6)
h w
This new variable C ik is an element of the correla-
tion matrix between the two input units i and k ,where
correlation is defined here as the expected value (av-
erage) of the product of their activity values over time
( C ik = hx i x k i t ). You might be familiar with the more
standard correlation measure:
(4.3)
( t +1)= w
( t )+￿
t
To understand the overall effect of the weight change
rule in equation 4.2, we want to know what is going to
happen to the weights as a result of learning over all
of the input patterns. This is easily expressed as just
the sum of equation 4.2 over time (where again time t
indexes the different input patterns):
(4.7)
h( x
￿ ￿
)( x
￿ ￿
which subtracts away the mean values ( ￿ )ofthevari-
ables before taking their product, and normalizes the
result by their variances ( ￿ 2 ). Thus, an important sim-
plification in this form of Hebbian correlational learning
is that it assumes that the activation variables have zero
mean and unit variance. We will see later that we can
do away with this assumption for the form of Hebbian
learning that we actually use.
The main result in equation 4.6 is that the changes
to the weight from input unit i are a weighted average
over the different input units (indexed by k ) of the corre-
lation between these other input units and the particular
input unit i (figure 4.6). Thus, where strong correla-
tions exist across input units, the weights for those units
(4.4)
￿ w
= ￿
We can analyze this sum of weight changes more easily
if we assume that the arbitrary learning rate constant ￿
is equal to
,where N is the total number of patterns
in the input. This turns the sum into an average :
(4.5)
(where the h x i t notation indicates the average or ex-
pected value of variable x over patterns t ).
If we substitute into this equation the formula for y j
(equation 4.1, using the index k to sum over the inputs),
Search WWH ::




Custom Search