Information Technology Reference
In-Depth Information
inputs. This will keep the weights from growing with-
out bound. Finally, because it is primarily based on the
same correlation terms C ik as the previous simple Heb-
bian learning rule, this Oja rule still computes the first
principal component of the input data (though the proof
of this is somewhat more involved, see
Hertz et al.,
1991 for a nice treatment).
4.5
Conditional Principal Components Analysis
To this point, we have considered only a single receiv-
ing unit and seen how it can represent the principal com-
ponent of the correlations in the inputs. In this section,
we explore ways in which this simple form of PCA can
be extended to the case where there is an entire layer
of receiving (hidden) units. To see that the simple PCA
rule will not directly work in this context, consider what
would happen if we just added multiple hidden units us-
ing the same activation function and learning rule as the
unit we analyzed above. They would all end up learn-
ing the exact same pattern of weights, because there is
usually only one strongest (first) principal component
to the input correlation matrix, and these algorithms are
guaranteed to find it.
There are two general ways of dealing with this prob-
lem, both of which involve introducing some kind of
interaction between the hidden units to make them do
different things. Thus, the problem here is not so much
in the form of the weight update rule per se, but rather
in the overall activation dynamics of the hidden units. It
is important to appreciate that there is such an intimate
relationship between activation dynamics and learning.
Indeed, we will repeatedly see that how a unit behaves
(i.e., its activation dynamics) determines to a great ex-
tent what it learns.
One solution to the problem of redundant hidden
units is to introduce specialized lateral connectivity be-
tween units configured to ensure that subsequent units
end up representing the sequentially weaker compo-
nents of the input correlation matrix (Sanger, 1989;
Oja, 1989). Thus, an explicit ordering is imposed on the
hidden units, such that the first unit ends up representing
the principal component with the strongest correlations,
the next unit gets the next strongest, and so on. We
will call this sequential principal components analy-
Figure 4.8: Sequential PCA (SPCA) performed on small
patches drawn from images of natural scenes, with the first
principal component (the “blob”) in the upper left, and sub-
sequent components following to the right and down. Each
square in the large grid shows a grid of receiving weights for
one of 64 hidden units, from a common layer of input units.
Figure reproduced from Olshausen and Field (1996).
sis (SPCA). This solution is unsatisfactory for several
reasons, from computational-level principles to avail-
able data about how populations of neurons encode in-
formation.
For example, SPCA assumes that all input patterns
share some common set of correlations (which would
be represented by the first principal component), and
that individual patterns can be differentiated by sequen-
tially finer and finer distinctions represented by the sub-
sequent components. This amounts to an assumption
of hierarchical structure , where there is some central
overarching tendency shared by everything in the en-
vironment, with individuals being special cases of this
overall principal component. In contrast, the world may
be more of a heterarchy , with lots of separate categories
of things that exist at roughly the same level.
Search WWH ::




Custom Search