Information Technology Reference
In-Depth Information
You can now see the entire progression from the ini-
tial weights to the line representation while looking at
the weights.
The simple point of this exploration is that Hebbian
learning will tend to cause units to represent stable
things in the environment. This is a particularly simple
case where there was just a single thing in the environ-
ment, but it serves to convey the point. We will explore
more interesting cases later. But first, let's try to under-
stand the mathematical and conceptual basis of Hebbian
learning.
and then incrementally fix various problems until we
arrive at the algorithm that we can actually use in our
simulations. The objective of this presentation is to
show how the main theme of extracting and represent-
ing the principal components of correlational structure
from the environment using Hebbian learning can be
implemented in a simple, biologically plausible, and
computationally effective manner.
The culmination of this effort comes in the use of
this Hebbian learning in a self-organizing network that
achieves the model learning objective of representing
the general statistical structure of the environment, as
we will explore in a simple and easily comprehensible
case. Chapter 8 provides a more impressive demonstra-
tion of the capabilities of this type of model learning in
replicating the way that the visual system represents the
statistical structure contained in actual visual scenes of
the natural environment.
Go to the PDP++Root window. To continue on to
the next simulation, close this project first by selecting
.projects/Remove/Project_0 . Or, if you wish to
stop now, quit by selecting Object/Quit .
, !
4.4
Principal Components Analysis
The general mathematical framework that we use to un-
derstand why Hebbian learning causes units to represent
correlations in the environment is called principal com-
ponents analysis (PCA). As the name suggests, PCA
is all about representing the major (principal) structural
elements (components) of the correlational structure of
an environment. 1 By focusing on the principal com-
ponents of correlational structure, this framework holds
out the promise of developing a reasonably parsimo-
nious model. Thus, it provides a useful, mathematical,
overall level of analysis within which to understand the
effects of learning. Further, we will see that PCA can
be implemented using a simple associative or Hebbian
learning mechanism like the NMDA-mediated synaptic
modification described earlier.
In what follows, we develop a particularly useful
form of Hebbian learning that performs a version of
PCA. We will rely on a combination of top-down math-
ematical derivations and bottom-up intuitions about
how weights should be adapted, and we will find that
there is a nice convergence between these levels of anal-
ysis. To make the fundamental computations clear, we
will start with the sim plest form of Hebbian learning,
1 Note that although PCA technically refers to the extraction of all
of the principal components of correlation (which can be arranged in
sequential order from first (strongest) to last (weakest)), we will use
the term to refer only to the strongest such components.
4.4.1
Simple Hebbian PCA in One Linear Unit
To capture the simplest version of Hebbian correlational
learning, we focus first on the case of a single linear re-
ceiving unit that receives input from a set of input units.
We will see that a simple Hebbian learning equation will
result in the unit extracting the first principle component
of correlation in the patterns of activity over the input
units.
Imagine that there is an environment that produces
activity patterns over the input units, such that there are
certain correlations among these input units. For con-
creteness, let's consider the simple case where the envi-
ronment is just the one line shown in figure 4.4, which
is repeatedly presented to a set of 25 input units. Be-
cause it is linear, the receiving unit's activation function
is just the weighted sum of its inputs (figure 4.6):
(4.1)
where k (rather than the usual i ) indexes over input
units, for reasons that will become clear. Also in this
and all subsequent equations in this chapter, all of the
variables are a function of the current time step t , nor-
mally designated by the ( t ) notation after every vari-
able; however, we drop the ( t ) 's to make things easier
Search WWH ::




Custom Search