Information Technology Reference
In-Depth Information
scribed above. The network is trained on only the first
18 objects in a range of positions and sizes. The last
two objects are used later for testing generalization.
This model can be seen as an extension of the pre-
vious one, in that the input is LGN-like, with separate
on- and off-center layers. Objects are represented us-
ing bars of light (activity in the on-center LGN input)
one pixel wide. The off-center input had activity at the
ends of each bar, representing the transition from light
to dark there. This end stopping information is widely
thought to be encoded by neurons in the early visual
cortex to represent lines of a particular length, for ex-
ample.
The LGN is 16 x 16 units and wraps around like the
previous simulation (i.e., the right-most unit is a neigh-
bor of the left-most one and same for top-bottom). Ob-
jects are one of four sizes, corresponding to 5, 7, 9, and
11 pixel-length bars. The lower left hand side of the
objects can be located anywhere within the 16 x 16 grid,
for a total of 256 different unique locations. Thus, all
combined, there were 1024 different unique “images”
of a given object.
As in the previous model, a V1-like area processes
the LGN input using simple oriented edge-detector rep-
resentations. We fixed these representations from the
outset to encode horizontal and vertical lines in all pos-
sible locations within the receptive field so that we
could use the smallest number of V1 units possible. In-
stead of trying to make combinations of different polar-
ities between the on- and off-center inputs, we just had
one set of units encode bars in the on-center field, and
another encode bars in the off-center field. Given that
the receptive field size is 4 x 4 from each LGN input,
there are 8 horizontal and vertical bars for the on-center
and eight for the off-center, for a total of 16 units ( 4 x 4 )
in each V1 hypercolumn. Thus, we have considerably
simplified the V1 representations to make the model
simpler and more compact, but the essential property
of orientation tuning is retained.
The next layers in the pathway represent the sub-
sequent areas in the cortical object recognition path-
way, V2 and V4. These areas have successively larger,
more complex, and more spatially invariant receptive
field properties. Due to the limited size of this model,
V4 representations encompass all of the visual input
space, and thus can produce fully invariant representa-
tions over this entire space. The relative simplicity of
the objects in our simulated environment also enables
the V4 representations to have sufficient complexity in
terms of feature combinations to distinguish among the
different objects. In a larger, more realistic model of
the cortex, full invariance and object-level complexity
would not be possible until the next layer of process-
ing in the inferior temporal cortex (IT). Thus, we have
effectively just collapsed the V4 and IT layers into one
layer.
The last layer in the network is the “output” layer,
which enables us to use task-based, error-driven learn-
ing (in addition to model-based Hebbian learning) to
train the network. It can be viewed as corresponding to
any of a number of possible task-like outputs. For ex-
ample, the different objects could have different sounds,
textures, or smells, and the network could be predicting
the corresponding representation in one of these other
modalities, with the feedback being used to improve
the ability of the visual system to identify the differ-
ent objects to accurately predict these other correlates.
Similarly, the objects may have different physical con-
sequences (e.g., some can stand by themselves, others
can roll, etc.), or they may serve as symbols like digits
or letters. In any case, we simply have a distinct out-
put unit for each object, and the network is trained to
produce the correct output unit given the image of the
object presented in the input. This task-based training
is important for successful learning in the network.
Additional Model Details
This section contains additional details and substantia-
tion of the basic features just described.
To verify that the weight linking shortcut does not
play a substantial role in the resulting performance of
the network, a control network was run without weight
sharing. This network took much more memory and
training time, as expected, but the resulting representa-
tions and overall performance were quite similar to the
network using weight sharing that we explore here. This
result makes sense because each hypercolumn should
experience the same input patterns over time, and thus
should develop roughly the same kinds of weight pat-
Search WWH ::




Custom Search