Perception and Attention - Computational Explorations in Cognitive Neuroscience

Information Technology Reference

In-Depth Information

tially invariant representations that nevertheless encode

the spatial arrangements between features (figure 8.10).

Thus, our model depends critically on learning a hier-

archical series of transformations that produce increas-

ingly more complex (in terms of object features) and

spatially invariant representations. Hierarchical repre-

sentations are likely to be important for many aspects of

cortical processing (see chapter 10 for more examples).

Our model's ability to learn such representations using

both task and model learning provides a key demonstra-

tion of this general principle.

Many researchers have suggested that object recog-

nition operates in roughly a hierarchical fashion, and

several existing models implement specific versions of

this idea (Fukushima, 1988; Mozer, 1987; LeCun et al.,

1989). These models separate the process of creat-

ing increasingly complex featural representations and

that of creating increasingly invariant representations

into two different interleaved stages of processing. One

stage collapses over locations, and the other builds more

complex feature representations. This makes the train-

ing easier, because the model can be specifically con-

strained to produce the appropriate types of representa-

tions at each layer.

In contrast, the present model is not constrained in

this stagelike way and develops both aspects of the rep-

resentation simultaneously and in the same units. This

accords well with the properties of the visual system,

which appears to achieve both increased featural com-

plexity and spatial invariance in the same stages of pro-

cessing.

Our model also demonstrates how a hierarchical se-

quence of transformations can work effectively on novel

inputs (i.e., generalization). Thus, we will see that a

new object can be learned (relatively rapidly) in a small

set of retinal positions and sizes, and then recognized

fairly reliably without further learning in the other (un-

trained) positions and sizes. It is important to demon-

strate generalization because it is unreasonable to ex-

pect that one must have already seen an object from all

possible viewpoints before being able to reliably rec-

ognize it. Generalization also falls naturally out of the

traditional approaches, but the gradual transformation

approach has not previously been shown to be capa-

ble of generalizing the invariance transformation (but

other kinds of generalization have been explored; Le-

Cun et al., 1989). Furthermore, the behavioral literature

shows that people can generalize their object recogni-

tion across locations, although with some level of degra-

dation (Peterson & Zemel, 1998; Zemel, Behrmann,

Mozer, & Bavelier, submitted).

Our model accomplishes generalization of the in-

variance transformation by extracting and transforming

complex structural features shared by all objects. The

higher levels of the network contain spatially invari-

ant representations of these complex object features, the

combinations of which uniquely identify particular ob-

jects. This assumes that there is a roughly fixed set or

vocabulary of underlying structural regularities shared

by all objects, that are also distinctive enough so that

the combination of such features disambiguates the ob-

jects. We ensure this in the model by constructing ob-

jects from a fixed set of line features, but it is also likely

to be true of objects in the real world, which can all

be seen as composed from a pallet of different surfaces,

textures, colors, component shapes, etc. Although one

particular suggestion exists as to what these component

shapes might be (Biederman, 1987), we do not have to

commit to such specifics because learning will automat-

ically find them (as it does in the model).

Finally, one important limitation of the current model

is that it processes only a single object at a time. How-

ever, we will see in a later section how spatial and

object-based representations interact to enable the per-

ception of complex and potentially confusing visual dis-

plays containing multiple objects.

8.4.1

Basic Properties of the Model

The basic structure of the model (figure 8.11) is much

like that of figure 8.10. Whereas the previous model

represented roughly a single cortical hypercolumn, this

model simulates a relatively wide cortical area spanning

many hypercolumns (the individual hypercolumns are

shown as smaller boxes within the layers in figure 8.11).

Even more so than in the previous case, this means that

the model is very scaled down in terms of the number of

neurons per hypercolumn. As before, the connectivity

patterns determine the effective scale of the model, as

indicated below.

Computational Explorations in Cognitive Neuroscience

Search WWH ::

Custom Search

Home