Hebbian Model Learning - Computational Explorations in Cognitive Neuroscience

Information Technology Reference

In-Depth Information

inputs. This will keep the weights from growing with-

out bound. Finally, because it is primarily based on the

same correlation terms C ik as the previous simple Heb-

bian learning rule, this Oja rule still computes the first

principal component of the input data (though the proof

of this is somewhat more involved, see

Hertz et al.,

1991 for a nice treatment).

4.5

Conditional Principal Components Analysis

To this point, we have considered only a single receiv-

ing unit and seen how it can represent the principal com-

ponent of the correlations in the inputs. In this section,

we explore ways in which this simple form of PCA can

be extended to the case where there is an entire layer

of receiving (hidden) units. To see that the simple PCA

rule will not directly work in this context, consider what

would happen if we just added multiple hidden units us-

ing the same activation function and learning rule as the

unit we analyzed above. They would all end up learn-

ing the exact same pattern of weights, because there is

usually only one strongest (first) principal component

to the input correlation matrix, and these algorithms are

guaranteed to find it.

There are two general ways of dealing with this prob-

lem, both of which involve introducing some kind of

interaction between the hidden units to make them do

different things. Thus, the problem here is not so much

in the form of the weight update rule per se, but rather

in the overall activation dynamics of the hidden units. It

is important to appreciate that there is such an intimate

relationship between activation dynamics and learning.

Indeed, we will repeatedly see that how a unit behaves

(i.e., its activation dynamics) determines to a great ex-

tent what it learns.

One solution to the problem of redundant hidden

units is to introduce specialized lateral connectivity be-

tween units configured to ensure that subsequent units

end up representing the sequentially weaker compo-

nents of the input correlation matrix (Sanger, 1989;

Oja, 1989). Thus, an explicit ordering is imposed on the

hidden units, such that the first unit ends up representing

the principal component with the strongest correlations,

the next unit gets the next strongest, and so on. We

will call this sequential principal components analy-

Figure 4.8: Sequential PCA (SPCA) performed on small

patches drawn from images of natural scenes, with the first

principal component (the “blob”) in the upper left, and sub-

sequent components following to the right and down. Each

square in the large grid shows a grid of receiving weights for

one of 64 hidden units, from a common layer of input units.

Figure reproduced from Olshausen and Field (1996).

sis (SPCA). This solution is unsatisfactory for several

reasons, from computational-level principles to avail-

able data about how populations of neurons encode in-

formation.

For example, SPCA assumes that all input patterns

share some common set of correlations (which would

be represented by the first principal component), and

that individual patterns can be differentiated by sequen-

tially finer and finer distinctions represented by the sub-

sequent components. This amounts to an assumption

of hierarchical structure , where there is some central

overarching tendency shared by everything in the en-

vironment, with individuals being special cases of this

overall principal component. In contrast, the world may

be more of a heterarchy , with lots of separate categories

of things that exist at roughly the same level.

Computational Explorations in Cognitive Neuroscience

Search WWH ::

Custom Search

Home