Information Technology Reference
In-Depth Information
terns. The use of Hebbian learning is undoubtedly im-
portant for the similarity between the weight shared
and separate weight networks, because this form of
learning tends to reliably produce the same weight pat-
terns. A purely error driven backpropagation network
run for comparison did not perform similarly in these
two cases; each hypercolumn in the separate weight
backpropagation network ended up doing very differ-
ent things, leading to worse performance in terms of the
spatial invariance properties of the resulting network.
This is consistent with the general finding that back-
propagation networks tend to have a high level of vari-
ance in their learned solutions.
One further architectural feature of the model is that
we did not include the excitatory lateral interconnec-
tions among units within the same layer that were
important for developing topography in the previous
model. This was done because including such connec-
tions makes the model that much slower, and they do not
appear to be essential given the model's good perfor-
mance. However, were they included, they could pre-
sumably impose various kinds of topographic orderings
of the object-based representations (see Tanaka, 1996,
for neurophysiological evidence of this), and would
likely be important for filling-in obscured parts of im-
ages and other similar kinds of pattern-completion phe-
nomena (as explored in chapter 3).
We did not include off-center activity along the
length of each bar because this would have been redun-
dant with the on-center representation for the barlike
stimuli we used. If our stimuli had surfacelike proper-
ties (as in the previous simulation), then this additional
off-center information would not have been redundant,
as it could indicate the relative brightness of different
surfaces.
If we let the V1 receptive fields develop through
learning instead of fixing them to be the minimal set
required by the task, extra units would be required as
we discussed in chapter 4. We have already demon-
strated that these kinds of representations will develop
in response to natural images, and we also ran a larger
model and verified that these representations do develop
here.
The parameters for the network are all standard, with
the amount of Hebbian learning set to .005 except for
the connections between V1 and V2, which are at .001.
This lower amount of Hebbian learning here is nec-
essary due to the weight sharing, because the same
weights are being updated many times for each input,
causing the Hebbian learning to be even more domi-
nant. The learning rate was also dropped to .001 from
.01 after 150 epochs, which helped to reduce the inter-
ference effects of weight changes from one pattern on
other patterns. The slower learning rate could have been
used from the start, but this significantly slows overall
learning.
8.4.2
Exploring the Model
[ Note: this simulation requires a minimum of 128Mb of
RAM to run. ]
Open the project objrec.proj.gz in chapter_8
to begin.
You will see that the network looks like a skeleton,
because it is too big to save all the units and connections
in the project file. We will build and connect the net-
work in a moment, but first, the skeleton reveals some
important aspects of the network structure. You can see
the LGN input layers, which are 16 x 16 units in size.
Above that, you can see the V1 layer, which has an 8 x 8
grid structure, where each of these 64 grid elements rep-
resents one hypercolumn of units. Each hypercolumn
will contain a group of 16 ( 4 x 4 ) units when the net-
work is built, and these units will all be connected to
thesamesmall( 4 x 4 ) region of both LGN inputs. As
discussed earlier, neighboring groups will be connected
to half-overlapping regions of LGN, as we will see more
clearly when the network is connected. In addition to
connectivity, these groups organize the inhibition within
the layer, as described above. The kWTA level is set to
6 out of 16 units within a hypercolumn, and 60 across
the entire layer (i.e., ten hypercolumns out of 64 could
have their maximal activity of 6, though activity need
not be distributed in this manner).
The V2 layer is also organized into a grid of hyper-
columns, this time 4 x 4 in size, with each hypercolumn
having 64 units ( 8 x 8 ). Again, inhibition operates at
both the hypercolumn and entire layer scales here, with
8 units active per hypercolumn and 48 over the entire
layer.
, !
Each hypercolumn of V2 units receives from
Search WWH ::




Custom Search