Hebbian Model Learning - Computational Explorations in Cognitive Neuroscience

Information Technology Reference

In-Depth Information

where the second form uses simplified notation that will

continue to be used below. We will call the learning rule

that achieves such weight values the CPCA algorithm.

The important characteristic of CPCA represented by

equation 4.11 is that the weights will reflect the extent

to which a given input unit is active across the subset

of input patterns represented by the receiving unit (i.e.,

conditioned on this receiving unit). If an input pattern

is a very typical aspect of such inputs, then the weights

from it will be large (near 1), and if it is not so typical,

they will be small (near 0).

It is useful to relate the conditional probabilities com-

puted by CPCA to the correlations computed by PCA.

A conditional probability of .5 means zero correlation

(i.e., the input is equally likely to be on as off when

the receiving unit is active), and values larger than .5

indicate positive correlation (input more likely to be on

than off when the receiving unit is on), while values less

than .5 indicate negative correlation (input more likely

to be off than on when the receiving unit is on). Note

that there is at least one important difference between

conditional probabilities and correlations: conditional

probabilities depend on the direction you compute them

(i.e., P ( a j b ) 6= P ( b j a ) in general), whereas correlations

come out the same way regardless of which way you

compute them.

One further intuition regarding the relationship be-

tween CPCA and PCA is that the receiver's activa-

tion y j , which serves as the conditionalizing factor in

CPCA, is a function of the other inputs to the unit, and

thus serves to make the weights from a given sending

unit dependent on the extent of its correlation with other

units, as reflected in y j . Thus, the same basic mech-

anism is at work here in CPCA as we saw in PCA.

However, we will see later that CPCA is also capable of

reflecting other important conditionalizing factors that

determine when the receiving unit is active (e.g., com-

petition amongst the hidden units).

Following the analysis of Rumelhart and Zipser

(1986), we show below that the following weight up-

date rule achieves the CPCA conditional probability ob-

jective represented in equation 4.11:

where is again the learning rate parameter ( lrate in

the simulator). The two equivalent forms of this equa-

tion are shown to emphasize the similarity of this learn-

ing rule to Oja's normalized PCA learning rule (equa-

tion 4.9), while also showing its simpler form. The

main difference between this and equation 4.9 is that the

Oja's rule subtracts off the square of the activation times

the weight, while equation 4.12 just subtracts off the ac-

tivation times the weight. Thus we would expect that

the CPCA will produce roughly similar weight changes

as the Oja rule, with a difference in the way that normal-

ization works (also note that because the activations in

CPCA are all positive probability-like values, the differ-

ence in squaring the activation does not affect the sign

of the weight change).

The second form of equation 4.12 emphasizes the fol-

lowing interpretation of what this form of learning is

accomplishing: the weights are adjusted to match the

value of the sending unit activation xi (i.e., minimizing

the difference between x i and w ij ), weighted in propor-

tion to the activation of the receiving unit ( y j ). Thus,

if the receiving unit is not active, no weight adjustment

will occur (effectively, the receiving unit doesn't care

what happens to the input unit when it is not itself ac-

tive). If the receiving unit is very active (near 1), it cares

a lot about what the input unit's activation is, and tries

to set the weight to match it. As these individual weight

changes are accumulated together with a slow learning

rate, the weight will come to approximate the expected

value of the sending unit when the receiver is active (in

other words, equation 4.11).

The next section shows formally how the weight up-

date rule in equation 4.12 implements the conditional

probability objective in equation 4.11. For those who

are less mathematically inclined, this analysis can be

skipped without significantly impacting the ability to

understand what follows.

4.5.2

Derivation of CPCA Learning Rule

This analysis, based on that of Rumelhart and Zipser

(1986), uses the same technique we used above to un-

derstand Oja's normalized PCA learning rule. Thus, we

will work backward from the weight update equation

(equation 4.12), and, by setting this to zero and thus

(4.12)

= [ y

= y

( x

Computational Explorations in Cognitive Neuroscience

Search WWH ::

Custom Search

Home