Information Technology Reference
In-Depth Information
tially invariant representations that nevertheless encode
the spatial arrangements between features (figure 8.10).
Thus, our model depends critically on learning a hier-
archical series of transformations that produce increas-
ingly more complex (in terms of object features) and
spatially invariant representations. Hierarchical repre-
sentations are likely to be important for many aspects of
cortical processing (see chapter 10 for more examples).
Our model's ability to learn such representations using
both task and model learning provides a key demonstra-
tion of this general principle.
Many researchers have suggested that object recog-
nition operates in roughly a hierarchical fashion, and
several existing models implement specific versions of
this idea (Fukushima, 1988; Mozer, 1987; LeCun et al.,
1989). These models separate the process of creat-
ing increasingly complex featural representations and
that of creating increasingly invariant representations
into two different interleaved stages of processing. One
stage collapses over locations, and the other builds more
complex feature representations. This makes the train-
ing easier, because the model can be specifically con-
strained to produce the appropriate types of representa-
tions at each layer.
In contrast, the present model is not constrained in
this stagelike way and develops both aspects of the rep-
resentation simultaneously and in the same units. This
accords well with the properties of the visual system,
which appears to achieve both increased featural com-
plexity and spatial invariance in the same stages of pro-
cessing.
Our model also demonstrates how a hierarchical se-
quence of transformations can work effectively on novel
inputs (i.e., generalization). Thus, we will see that a
new object can be learned (relatively rapidly) in a small
set of retinal positions and sizes, and then recognized
fairly reliably without further learning in the other (un-
trained) positions and sizes. It is important to demon-
strate generalization because it is unreasonable to ex-
pect that one must have already seen an object from all
possible viewpoints before being able to reliably rec-
ognize it. Generalization also falls naturally out of the
traditional approaches, but the gradual transformation
approach has not previously been shown to be capa-
ble of generalizing the invariance transformation (but
other kinds of generalization have been explored; Le-
Cun et al., 1989). Furthermore, the behavioral literature
shows that people can generalize their object recogni-
tion across locations, although with some level of degra-
dation (Peterson & Zemel, 1998; Zemel, Behrmann,
Mozer, & Bavelier, submitted).
Our model accomplishes generalization of the in-
variance transformation by extracting and transforming
complex structural features shared by all objects. The
higher levels of the network contain spatially invari-
ant representations of these complex object features, the
combinations of which uniquely identify particular ob-
jects. This assumes that there is a roughly fixed set or
vocabulary of underlying structural regularities shared
by all objects, that are also distinctive enough so that
the combination of such features disambiguates the ob-
jects. We ensure this in the model by constructing ob-
jects from a fixed set of line features, but it is also likely
to be true of objects in the real world, which can all
be seen as composed from a pallet of different surfaces,
textures, colors, component shapes, etc. Although one
particular suggestion exists as to what these component
shapes might be (Biederman, 1987), we do not have to
commit to such specifics because learning will automat-
ically find them (as it does in the model).
Finally, one important limitation of the current model
is that it processes only a single object at a time. How-
ever, we will see in a later section how spatial and
object-based representations interact to enable the per-
ception of complex and potentially confusing visual dis-
plays containing multiple objects.
8.4.1
Basic Properties of the Model
The basic structure of the model (figure 8.11) is much
like that of figure 8.10. Whereas the previous model
represented roughly a single cortical hypercolumn, this
model simulates a relatively wide cortical area spanning
many hypercolumns (the individual hypercolumns are
shown as smaller boxes within the layers in figure 8.11).
Even more so than in the previous case, this means that
the model is very scaled down in terms of the number of
neurons per hypercolumn. As before, the connectivity
patterns determine the effective scale of the model, as
indicated below.
Search WWH ::




Custom Search