Information Technology Reference
In-Depth Information
where the second form uses simplified notation that will
continue to be used below. We will call the learning rule
that achieves such weight values the CPCA algorithm.
The important characteristic of CPCA represented by
equation 4.11 is that the weights will reflect the extent
to which a given input unit is active across the subset
of input patterns represented by the receiving unit (i.e.,
conditioned on this receiving unit). If an input pattern
is a very typical aspect of such inputs, then the weights
from it will be large (near 1), and if it is not so typical,
they will be small (near 0).
It is useful to relate the conditional probabilities com-
puted by CPCA to the correlations computed by PCA.
A conditional probability of .5 means zero correlation
(i.e., the input is equally likely to be on as off when
the receiving unit is active), and values larger than .5
indicate positive correlation (input more likely to be on
than off when the receiving unit is on), while values less
than .5 indicate negative correlation (input more likely
to be off than on when the receiving unit is on). Note
that there is at least one important difference between
conditional probabilities and correlations: conditional
probabilities depend on the direction you compute them
(i.e., P ( a j b ) 6= P ( b j a ) in general), whereas correlations
come out the same way regardless of which way you
compute them.
One further intuition regarding the relationship be-
tween CPCA and PCA is that the receiver's activa-
tion y j , which serves as the conditionalizing factor in
CPCA, is a function of the other inputs to the unit, and
thus serves to make the weights from a given sending
unit dependent on the extent of its correlation with other
units, as reflected in y j . Thus, the same basic mech-
anism is at work here in CPCA as we saw in PCA.
However, we will see later that CPCA is also capable of
reflecting other important conditionalizing factors that
determine when the receiving unit is active (e.g., com-
petition amongst the hidden units).
Following the analysis of Rumelhart and Zipser
(1986), we show below that the following weight up-
date rule achieves the CPCA conditional probability ob-
jective represented in equation 4.11:
where ￿ is again the learning rate parameter ( lrate in
the simulator). The two equivalent forms of this equa-
tion are shown to emphasize the similarity of this learn-
ing rule to Oja's normalized PCA learning rule (equa-
tion 4.9), while also showing its simpler form. The
main difference between this and equation 4.9 is that the
Oja's rule subtracts off the square of the activation times
the weight, while equation 4.12 just subtracts off the ac-
tivation times the weight. Thus we would expect that
the CPCA will produce roughly similar weight changes
as the Oja rule, with a difference in the way that normal-
ization works (also note that because the activations in
CPCA are all positive probability-like values, the differ-
ence in squaring the activation does not affect the sign
of the weight change).
The second form of equation 4.12 emphasizes the fol-
lowing interpretation of what this form of learning is
accomplishing: the weights are adjusted to match the
value of the sending unit activation xi (i.e., minimizing
the difference between x i and w ij ), weighted in propor-
tion to the activation of the receiving unit ( y j ). Thus,
if the receiving unit is not active, no weight adjustment
will occur (effectively, the receiving unit doesn't care
what happens to the input unit when it is not itself ac-
tive). If the receiving unit is very active (near 1), it cares
a lot about what the input unit's activation is, and tries
to set the weight to match it. As these individual weight
changes are accumulated together with a slow learning
rate, the weight will come to approximate the expected
value of the sending unit when the receiver is active (in
other words, equation 4.11).
The next section shows formally how the weight up-
date rule in equation 4.12 implements the conditional
probability objective in equation 4.11. For those who
are less mathematically inclined, this analysis can be
skipped without significantly impacting the ability to
understand what follows.
4.5.2
Derivation of CPCA Learning Rule
This analysis, based on that of Rumelhart and Zipser
(1986), uses the same technique we used above to un-
derstand Oja's normalized PCA learning rule. Thus, we
will work backward from the weight update equation
(equation 4.12), and, by setting this to zero and thus
(4.12)
￿ w
= ￿ [ y
￿ y
= ￿y
( x
￿ w
Search WWH ::




Custom Search