Related Work - Hierarchical Neural Networks for Image Interpretation

Information Technology Reference

In-Depth Information

Each level consists of three layers that contain different cell types. The S-layer

is the first layer of a level. It contains S-cells that receive excitatory input via ad-

justable weights from small windows centered at the corresponding position in all

C-planes of the layer below. S-cells in Level 0 access the input image directly. Not

shown in the figure are V-cells that provide inhibitory input to the S-cells. V-cells

are excited by all C-cells of the corresponding position in the lower level and com-

pute a smoothed activity sum to control the gain of S-cells. The output φ ( 1+ e

1+ ri

− 1)

of an S-cell depends on the total excitation e , the total inhibition i , and a selectivity

parameter r . It is passed through a rectifying function φ that is zero for negative

activations. The weights and the selectivity are chosen such that the S-cell activ-

ity is very sparse. An S-cell reacts to features that resemble its specific excitatory

weight matrix. All S-cells of a plane share the same weights and thus extract the

same feature at different locations.

Invariance is produced in the network by the connections from the S-cells to the

C-cells, which reside in the second layer of a level. These excitatory weights are

not adjustable. They are prewired in such a way that a C-cell responds if any of the

S-cells from a small window in the associated S-plane at the corresponding position

is active. Hence, C-representations are blurred copies of S-activities that are less

variant to input distortions.

The Neocognitron is trained level by level, starting at the bottom of the hier-

archy. The adaptable excitatory weights of the S-cells can be trained either in a

unsupervised mode or with supervision. For unsupervised training, the S-cells of a

layer that correspond to similar positions first compete to react to an input pattern.

The winning cell is then updated, such that it will react more strongly the next time

the same pattern appears. In the supervised training mode [78], a human operator se-

lects the features that a cell should respond to and the weights are updated according

to a Hebbian rule that is multiplied with a Gaussian window to give the features in

the center of the receptive field an advantage. Inhibition and excitation are increased

simultaneously to make the cells more and more specific.

Although the network is able to learn to recognize distorted patterns from rela-

tively few training examples, training has been reported to be rather difficult [147]

due to the sensitivity of the network's performance to the choice of parameters like

the S-cell selectivity r . It was recommended to chose a high selectivity in the lower

levels and to decrease it towards the top of the hierarchy.

HMAX Model of Object Recognition. A modern version of a hierarchical fea-

ture extraction network is the HMAX model, proposed by Riesenhuber and Pog-

gio [192]. The architecture of the network is sketched in Figure 3.7. Similar to the

Neocognitron, it consists of alternating S-layers and C-layers. The S-layers contain

feature extracting cells that compute a weighted sum of their inputs, followed by a

rectifying transfer function. S-cells receive their inputs from C-cells at correspond-

ing positions in the next lower layer. C-cells are used to pool a group of S-cells that

share some parameters, but differ in one or more other parameters. They compute

the maximum of the activities of these S-cells. Hence, C-cell responses are invariant

to the parameters spanned by their associated S-cells.

Hierarchical Neural Networks for Image Interpretation

Search WWH ::

Custom Search

Home