Hebbian Model Learning - Computational Explorations in Cognitive Neuroscience

Information Technology Reference

In-Depth Information

ships among different input patterns. The neighborhood

of activity around the single winner in a Kohonen net-

work causes the hidden units to represent similar things

as their neighbors, resulting in topographic maps over

learning.

The networks studied by Miller and colleagues

(Miller et al., 1989; Miller, 1994) have also focused on

the development of topography, and can be seen as more

biologically based versions of the same basic principles

embodied in the Kohonen network. We explore these

issues in more detail in chapter 8 in the context of a

simple CPCA + kWTA model that also has lateral exci-

tatory connections that induce a topographic represen-

tation much like that observed in primary visual cortex.

capacity of the hidden layer, inhibition works to balance

the information maximization objective. This is under-

stood in MDL by measuring the information in the hid-

den layer relative to a set of prior assumptions (e.g., that

only a few units should be active), so that less informa-

tion is required to specify a representation that closely

fits these assumptions.

It should also be emphasized in this context that the

form of Hebbian learning used in CPCA is always ex-

tracting only the first principal component from the sub-

set of input patterns where the receiving unit is ac-

tive. Thus, the learning rule itself, in addition to the

inhibitory competition, imposes a pressure to develop

a relatively parsimonious model of those input patterns

(note that although the first principal component is the

most informative, it is typically far from capable of rep-

resenting all the information present in the set of in-

put patterns). Furthermore, the contrast enhancement

function imposes an even greater parsimony bias, as dis-

cussed previously.

4.9.4

Information Maximization and MDL

Yet another important approach toward understanding

the effects of Hebbian learning was developed by

Linsker (1988) in terms of information maximization .

The idea here is that model learning should develop rep-

resentations that maximize the amount of information

conveyed about the input patterns. The first principal

component of the correlation matrix conveys the most

information possible for a single unit, because it causes

the unit's output to have the greatest amount of variance

over all the input patterns, and variance is tantamount to

information.

However, this idea of maximizing information must

be placed into the context of other constraints on the

representations, because if taken to an extreme, it would

result in the development of representations that capture

all of the information present in the input. This is both

unrealistic and undesirable, because it would result in

relatively unparsimonious representations.

Thus, it is more useful to consider the role of Heb-

bian learning in the context of a tradeoff between

maximizing information, and minimizing the complex-

ity of the representations (i.e., parsimony). This fi-

delity/simplicity tradeoff (which we alluded to previ-

ously) is elegantly represented in a framework known

as minimum description length (MDL; Zemel, 1993;

Rissanen, 1986). The MDL framework makes it clear

that kWTA inhibitory competition leads to more parsi-

monious models. By lowering the overall information

4.9.5

Learning Based Primarily on Hidden Layer

Constraints

As discussed previously, model learning tends to work

better with appropriate constraints or biases, but in-

appropriate constraints can be a bad thing. A num-

ber of self-organizing models depend very heavily on

constraints imposed on the hidden layer representations

that exhibit just this kind of tradeoff. Perhaps the pro-

totypical example of this type is the BCM algorithm

(Bienenstock, Cooper, & Munro, 1982), which has the

strong constraint that each hidden unit is active for es-

sentially the same percentage of time as every other hid-

den unit. This is effective when the relevant features

are uniformly distributed in the environment, and when

you have a good match between the number of units in

the hidden layer and the number of features. However,

when these constraints do not match the environment,

the algorithm's performance suffers.

As we discussed previously, we are particularly con-

cerned about making assumptions on the precise num-

ber of hidden units, because there are many reasons to

believe that the cortex has an overabundance of neurons

relative to the demands of any given learning task. Also,

Computational Explorations in Cognitive Neuroscience

Search WWH ::

Custom Search

Home