Perception and Attention - Computational Explorations in Cognitive Neuroscience

Information Technology Reference

In-Depth Information

1995). The model presented here is comparatively sim-

ple, and has not been analyzed as extensively as some

of the other models in the literature. Nevertheless, the

current model may be unique in the number of known

properties of V1 representations that it captures based

on processing natural images. Although the Olshausen

and Field (1996) model used natural images, it did not

capture the topographic aspects of V1 representations.

Most other models use artificial input stimuli that may

not provide an accurate rendition of the statistics of nat-

ural images.

In the next section, we will see how these V1 rep-

resentations can provide the building blocks for more

elaborate representations in subsequent areas of the vi-

sual system.

Human

Arm

Forearm

Hand

Figure 8.9: Schematic of Marr's view of object recognition

based on a creating an internal 3-D model of the object, here

using a hierarchical system of cylinders.

the visual properties of the object's surfaces (figure 8.9).

Object recognition then amounts to matching the model

constructed from the perceptual input to the closest fit-

ting 3-D structure associated with known objects. The

advantage of this approach is that the 3-D structural rep-

resentation is completely invariant with respect to par-

ticular 2-D views, so that one only needs to store a sin-

gle canonical representation of an object, and it can then

be recognized from any viewpoint.

Hinton (1981) proposed a neural-network approach

to invariant object recognition that bears some re-

semblance to the Marr approach (see also Zemel,

Mozer, & Hinton, 1989). In Hinton's model, a com-

bined constraint-satisfaction search is performed simul-

taneously at two levels: (1) the parameters of a set of

transformation functions that operate on the input im-

age (e.g., rotation, translation, dilation, etc.), which are

used to transform the image into a canonical form; (2)

the match between the transformed image and a stored

canonical view. Again, the objective is to dynamically

construct a canonicalized representation that can be eas-

ily matched with stored, canonical object representa-

tions.

Both Marr's approach of producing a full 3-D struc-

tural model based on a given 2-D view, and Hinton's

approach of finding an appropriate set of canonical-

izing transformations, are exceedingly difficult search

problems. Essentially, there are many 2-D projections

that map onto many 3-D structures (or many trans-

formations that map onto many canonical representa-

8.4

Object Recognition and the Visual Form

Pathway

The next model simulates the object recognition path-

way that processes the visual form of objects (i.e., the

ventral “what” pathway). This model uses V1-like edge

detector representations at the lowest level and builds all

the way up to spatially invariant representations that

enable recognition of objects regardless of where they

appear in the visual input space, and over a range of dif-

ferent sizes. Because we focus on shape over other vi-

sual properties of objects (e.g., color, motion), it might

be more appropriate to consider the model one of shape

recognition , but the same basic principles should gener-

alize to these other object properties.

We take the development of spatially invariant repre-

sentations to be the defining characteristic of process-

ing in the ventral pathway: transformations collapse

across different spatial locations and sizes while pre-

serving distinctions among different objects. Although

only two forms of invariance (location and size) are sim-

ulated here, the same mechanisms should lead to at least

some degree of rotation invariance as well (note that

it appears that the brain does not exhibit a very high

degree of rotational invariance in its representations;

Tanaka, 1996).

The traditional approach to spatially invariant object

recognition (e.g., Marr, 1982) involves creating an in-

ternal 3-D model of the structure of the object based on

Computational Explorations in Cognitive Neuroscience

Search WWH ::

Custom Search

Home