Information Technology Reference
In-Depth Information
One of the most important contributions of this model
is the ability to understand the functional implications
of the observed neural response properties, and how
these contribute to a sensible computational algorithm
for doing object recognition. Thus, we can see how the
binding problem in object recognition can be averted
by developing representations with increasingly com-
plex featural encodings and increasing levels of spatial
invariance. Further, we can see that by developing com-
plex but still distributed (i.e., subobject level) featural
encodings of objects, the system can generalize the in-
variance transformation to novel objects.
One major unresolved issue has to do with the nature
of the complex representations that enable generaliza-
tion in the model. What should these representations
look like for actual objects, and can computational mod-
els such as this one provide some insight into this issue?
Obviously, the objects used in the current model are too
simple to tell us much about real objects. To address this
issue, the model would have to be made significantly
more complex, with a better approximation to actual vi-
sual feature encoding (e.g., more like the V1 receptive
field model), and it should be trained on a large range of
actual objects. This would require considerably faster
computational machinery and a large amount of mem-
ory to implement, and is thus unlikely to happen in the
near future.
As we mentioned earlier, Biederman (1987) has
made a proposal about a set of object components
(called geons ) that could in theory correspond to the
kinds of distributed featural representations that our
model developed in its V4 layer. Geons are relatively
simple geometrical shapes based on particularly infor-
mative features of objects that are likely to provide use-
ful disambiguating information over a wide range of
different viewpoints (so-called non-accidental proper-
ties; Lowe, 1987). Although we obviously find the gen-
eral idea of object features important, we are not con-
vinced that the brain uses geons. We are not aware of
any neural recording data that supports the geon model.
Furthermore, the available behavioral support mostly
just suggests that features like corners are more infor-
mative than the middle portions of contours (Bieder-
man & Cooper, 1991). This does not specifically sup-
port the geon model, as corners (and junctions more
generally) are likely to be important for just about any
model of object recognition. We also suspect that the
representations developed by neural learning mecha-
nisms would be considerably more complex and diffi-
cult to describe than geons, given the complex, high-
dimensional space of object features. Nevertheless, we
are optimistic that future models will be able to speak
to these issues more directly.
One objection that might be raised against our model
is that it builds the location invariance solution into the
network architecture by virtue of the spatially localized
receptive fields. The concern might be that this archi-
tectural solution would not generalize to other forms of
invariance (e.g., size or rotation). However, by demon-
strating the ability of the model to do size invariant ob-
ject recognition, we have shown that the architecture is
not doing all the work. Although the scale and feat-
ural simplicity of this model precludes the exploration
of rotational invariance (i.e., only 90-degree rotations
are possible, with the horizontal and vertical input fea-
tures, such rotations turn one object into another), we
do think that the same basic principles could produce at
least the somewhat limited amounts of rotational invari-
ance observed in neural recording studies. As we have
stated, the network achieves invariance by representing
conjunctions of features over limited ranges of transfor-
mation. Thus, V2 neurons could also encode conjunc-
tions of features over small angles of rotation, and V4
neurons could build on this to produce more complex
representations that are invariant over larger angles of
rotation, and so on.
Finally, it is important to emphasize the importance
of using both error-driven and Hebbian learning in this
model. Neither purely error-driven nor purely Hebbian
versions of this network were capable of learning suc-
cessfully (the purely error-driven did somewhat better
than the purely Hebbian, which essentially did not learn
at all). This further validates the analyses from chap-
ter 6 regarding the importance of Hebbian learning in
deep, multilayered networks such as this one. Error-
driven learning is essential for the network to form rep-
resentations that discriminate the different objects; oth-
erwise it gets confused by the extreme amount of feat-
ural overlap among the objects. Although it is possible
that in the much higher dimensional space of real ob-
Search WWH ::




Custom Search