Perception and Attention - Computational Explorations in Cognitive Neuroscience

Information Technology Reference

In-Depth Information

scribed above. The network is trained on only the first

18 objects in a range of positions and sizes. The last

two objects are used later for testing generalization.

This model can be seen as an extension of the pre-

vious one, in that the input is LGN-like, with separate

on- and off-center layers. Objects are represented us-

ing bars of light (activity in the on-center LGN input)

one pixel wide. The off-center input had activity at the

ends of each bar, representing the transition from light

to dark there. This end stopping information is widely

thought to be encoded by neurons in the early visual

cortex to represent lines of a particular length, for ex-

ample.

The LGN is 16 x 16 units and wraps around like the

previous simulation (i.e., the right-most unit is a neigh-

bor of the left-most one and same for top-bottom). Ob-

jects are one of four sizes, corresponding to 5, 7, 9, and

11 pixel-length bars. The lower left hand side of the

objects can be located anywhere within the 16 x 16 grid,

for a total of 256 different unique locations. Thus, all

combined, there were 1024 different unique “images”

of a given object.

As in the previous model, a V1-like area processes

the LGN input using simple oriented edge-detector rep-

resentations. We fixed these representations from the

outset to encode horizontal and vertical lines in all pos-

sible locations within the receptive field so that we

could use the smallest number of V1 units possible. In-

stead of trying to make combinations of different polar-

ities between the on- and off-center inputs, we just had

one set of units encode bars in the on-center field, and

another encode bars in the off-center field. Given that

the receptive field size is 4 x 4 from each LGN input,

there are 8 horizontal and vertical bars for the on-center

and eight for the off-center, for a total of 16 units ( 4 x 4 )

in each V1 hypercolumn. Thus, we have considerably

simplified the V1 representations to make the model

simpler and more compact, but the essential property

of orientation tuning is retained.

The next layers in the pathway represent the sub-

sequent areas in the cortical object recognition path-

way, V2 and V4. These areas have successively larger,

more complex, and more spatially invariant receptive

field properties. Due to the limited size of this model,

V4 representations encompass all of the visual input

space, and thus can produce fully invariant representa-

tions over this entire space. The relative simplicity of

the objects in our simulated environment also enables

the V4 representations to have sufficient complexity in

terms of feature combinations to distinguish among the

different objects. In a larger, more realistic model of

the cortex, full invariance and object-level complexity

would not be possible until the next layer of process-

ing in the inferior temporal cortex (IT). Thus, we have

effectively just collapsed the V4 and IT layers into one

layer.

The last layer in the network is the “output” layer,

which enables us to use task-based, error-driven learn-

ing (in addition to model-based Hebbian learning) to

train the network. It can be viewed as corresponding to

any of a number of possible task-like outputs. For ex-

ample, the different objects could have different sounds,

textures, or smells, and the network could be predicting

the corresponding representation in one of these other

modalities, with the feedback being used to improve

the ability of the visual system to identify the differ-

ent objects to accurately predict these other correlates.

Similarly, the objects may have different physical con-

sequences (e.g., some can stand by themselves, others

can roll, etc.), or they may serve as symbols like digits

or letters. In any case, we simply have a distinct out-

put unit for each object, and the network is trained to

produce the correct output unit given the image of the

object presented in the input. This task-based training

is important for successful learning in the network.

Additional Model Details

This section contains additional details and substantia-

tion of the basic features just described.

To verify that the weight linking shortcut does not

play a substantial role in the resulting performance of

the network, a control network was run without weight

sharing. This network took much more memory and

training time, as expected, but the resulting representa-

tions and overall performance were quite similar to the

network using weight sharing that we explore here. This

result makes sense because each hypercolumn should

experience the same input patterns over time, and thus

should develop roughly the same kinds of weight pat-

Computational Explorations in Cognitive Neuroscience

Search WWH ::

Custom Search

Home