Information Technology Reference
In-Depth Information
ships among different input patterns. The neighborhood
of activity around the single winner in a Kohonen net-
work causes the hidden units to represent similar things
as their neighbors, resulting in topographic maps over
learning.
The networks studied by Miller and colleagues
(Miller et al., 1989; Miller, 1994) have also focused on
the development of topography, and can be seen as more
biologically based versions of the same basic principles
embodied in the Kohonen network. We explore these
issues in more detail in chapter 8 in the context of a
simple CPCA + kWTA model that also has lateral exci-
tatory connections that induce a topographic represen-
tation much like that observed in primary visual cortex.
capacity of the hidden layer, inhibition works to balance
the information maximization objective. This is under-
stood in MDL by measuring the information in the hid-
den layer relative to a set of prior assumptions (e.g., that
only a few units should be active), so that less informa-
tion is required to specify a representation that closely
fits these assumptions.
It should also be emphasized in this context that the
form of Hebbian learning used in CPCA is always ex-
tracting only the first principal component from the sub-
set of input patterns where the receiving unit is ac-
tive. Thus, the learning rule itself, in addition to the
inhibitory competition, imposes a pressure to develop
a relatively parsimonious model of those input patterns
(note that although the first principal component is the
most informative, it is typically far from capable of rep-
resenting all the information present in the set of in-
put patterns). Furthermore, the contrast enhancement
function imposes an even greater parsimony bias, as dis-
cussed previously.
4.9.4
Information Maximization and MDL
Yet another important approach toward understanding
the effects of Hebbian learning was developed by
Linsker (1988) in terms of information maximization .
The idea here is that model learning should develop rep-
resentations that maximize the amount of information
conveyed about the input patterns. The first principal
component of the correlation matrix conveys the most
information possible for a single unit, because it causes
the unit's output to have the greatest amount of variance
over all the input patterns, and variance is tantamount to
information.
However, this idea of maximizing information must
be placed into the context of other constraints on the
representations, because if taken to an extreme, it would
result in the development of representations that capture
all of the information present in the input. This is both
unrealistic and undesirable, because it would result in
relatively unparsimonious representations.
Thus, it is more useful to consider the role of Heb-
bian learning in the context of a tradeoff between
maximizing information, and minimizing the complex-
ity of the representations (i.e., parsimony). This fi-
delity/simplicity tradeoff (which we alluded to previ-
ously) is elegantly represented in a framework known
as minimum description length (MDL; Zemel, 1993;
Rissanen, 1986). The MDL framework makes it clear
that kWTA inhibitory competition leads to more parsi-
monious models. By lowering the overall information
4.9.5
Learning Based Primarily on Hidden Layer
Constraints
As discussed previously, model learning tends to work
better with appropriate constraints or biases, but in-
appropriate constraints can be a bad thing. A num-
ber of self-organizing models depend very heavily on
constraints imposed on the hidden layer representations
that exhibit just this kind of tradeoff. Perhaps the pro-
totypical example of this type is the BCM algorithm
(Bienenstock, Cooper, & Munro, 1982), which has the
strong constraint that each hidden unit is active for es-
sentially the same percentage of time as every other hid-
den unit. This is effective when the relevant features
are uniformly distributed in the environment, and when
you have a good match between the number of units in
the hidden layer and the number of features. However,
when these constraints do not match the environment,
the algorithm's performance suffers.
As we discussed previously, we are particularly con-
cerned about making assumptions on the precise num-
ber of hidden units, because there are many reasons to
believe that the cortex has an overabundance of neurons
relative to the demands of any given learning task. Also,
Search WWH ::




Custom Search