Perception and Attention - Computational Explorations in Cognitive Neuroscience

Information Technology Reference

In-Depth Information

tions), resulting in an underdetermined combinatorial

many-to-many search problem. A successful imple-

mentation of this type of approach has yet to be demon-

strated. Furthermore, the evidence from neural record-

ing in monkeys suggests that visual object representa-

tions are somewhat more view-specific (e.g., Tanaka,

1996), and not fully 3-D invariant or canonical. For

example, although IT neurons appear to be relatively

location and size invariant, they are not fully invariant

with respect to rotations either in the plane or in depth.

Behavioral studies in humans appear to provide some

support for view-specific object representations, but this

issue is still strongly debated (e.g. Tarr & Bulthoff,

1995; Biederman & Gerhardstein, 1995; Biederman &

Cooper, 1992; Burgund & Marsolek, in press).

For these reasons, we and others have taken a differ-

ent approach to object recognition based on the gradual,

hierarchical, parallel transformations that the brain is so

well suited for performing. Instead of casting object

recognition as a massive dynamic search problem, we

can think of it in terms of a gradual sequence of trans-

formations (operating in parallel) that emphasize cer-

tain distinctions and collapse across others. If the end

result of this sequence of transformations retains suffi-

cient distinctions to disambiguate different objects, but

collapses across irrelevant differences produced by dif-

ferent viewing perspectives, then invariant object recog-

nition has been achieved. This approach is consider-

ably simpler because it does not try to recover the com-

plete 3-D structural information or form complex inter-

nal models. It simply strives to preserve sufficient dis-

tinctions to disambiguate different objects, while allow-

ing lots of other information to be discarded. Note that

we are not denying that people perceive 3-D informa-

tion, just that object recognition is not based on canoni-

cal, structural representations of this information.

One of the most important challenges for the grad-

ual transformation approach to spatially invariant object

recognition is the binding problem discussed in chap-

ter 7. In recognizing an object, one must both encode

the spatial relationship between different features of the

object (e.g., it matters if a particular edge is on the right

or left hand side of the object), while at the same time

collapsing across the overall spatial location of the ob-

ject as it appears on the retina. If you simply encoded

Figure 8.10: Hierarchical sequence of transformations that

produce spatially invariant representations. The first level en-

codes simple feature conjunctions across a relatively small

range of locations. The next level encodes more complex fea-

ture conjunctions in a wider range of locations. Finally, in this

simple case, the third level can integrate across all locations of

the same object, producing a fully invariant representation.

each feature completely separately in a spatially invari-

ant fashion, and then tried to recognize objects on the

basis of the resulting collection of features, you would

lose track of the spatial arrangement (binding) of these

features relative to each other, and would thus confuse

objects that have the same features but in different ar-

rangements. For example the capital letters “T” and “L”

are both composed of a horizontal and a vertical line, so

one needs to represent the way these lines intersect to

disambiguate the letters.

As perhaps most clearly enunciated by Mozer

(1987) (see also Mozer, 1991; Fukushima, 1988; Le-

Cun, Boser, Denker, Henderson, Howard, Hubbard, &

Jackel, 1989), the binding problem for shape recogni-

tion can be managed by encoding limited combinations

of features in a way that reflects their spatial arrange-

ment, while at the same time recognizing these feature

combinations in a range of different spatial locations.

By repeatedly performing this type of transformation

over many levels of processing, one ends up with spa-

Computational Explorations in Cognitive Neuroscience

Search WWH ::

Custom Search

Home