Information Technology Reference
In-Depth Information
Each level consists of three layers that contain different cell types. The S-layer
is the first layer of a level. It contains S-cells that receive excitatory input via ad-
justable weights from small windows centered at the corresponding position in all
C-planes of the layer below. S-cells in Level 0 access the input image directly. Not
shown in the figure are V-cells that provide inhibitory input to the S-cells. V-cells
are excited by all C-cells of the corresponding position in the lower level and com-
pute a smoothed activity sum to control the gain of S-cells. The output φ ( 1+ e
1+ ri
1)
of an S-cell depends on the total excitation e , the total inhibition i , and a selectivity
parameter r . It is passed through a rectifying function φ that is zero for negative
activations. The weights and the selectivity are chosen such that the S-cell activ-
ity is very sparse. An S-cell reacts to features that resemble its specific excitatory
weight matrix. All S-cells of a plane share the same weights and thus extract the
same feature at different locations.
Invariance is produced in the network by the connections from the S-cells to the
C-cells, which reside in the second layer of a level. These excitatory weights are
not adjustable. They are prewired in such a way that a C-cell responds if any of the
S-cells from a small window in the associated S-plane at the corresponding position
is active. Hence, C-representations are blurred copies of S-activities that are less
variant to input distortions.
The Neocognitron is trained level by level, starting at the bottom of the hier-
archy. The adaptable excitatory weights of the S-cells can be trained either in a
unsupervised mode or with supervision. For unsupervised training, the S-cells of a
layer that correspond to similar positions first compete to react to an input pattern.
The winning cell is then updated, such that it will react more strongly the next time
the same pattern appears. In the supervised training mode [78], a human operator se-
lects the features that a cell should respond to and the weights are updated according
to a Hebbian rule that is multiplied with a Gaussian window to give the features in
the center of the receptive field an advantage. Inhibition and excitation are increased
simultaneously to make the cells more and more specific.
Although the network is able to learn to recognize distorted patterns from rela-
tively few training examples, training has been reported to be rather difficult [147]
due to the sensitivity of the network's performance to the choice of parameters like
the S-cell selectivity r . It was recommended to chose a high selectivity in the lower
levels and to decrease it towards the top of the hierarchy.
HMAX Model of Object Recognition. A modern version of a hierarchical fea-
ture extraction network is the HMAX model, proposed by Riesenhuber and Pog-
gio [192]. The architecture of the network is sketched in Figure 3.7. Similar to the
Neocognitron, it consists of alternating S-layers and C-layers. The S-layers contain
feature extracting cells that compute a weighted sum of their inputs, followed by a
rectifying transfer function. S-cells receive their inputs from C-cells at correspond-
ing positions in the next lower layer. C-cells are used to pool a group of S-cells that
share some parameters, but differ in one or more other parameters. They compute
the maximum of the activities of these S-cells. Hence, C-cell responses are invariant
to the parameters spanned by their associated S-cells.
Search WWH ::




Custom Search