Neural Abstraction Pyramid Architecture - Hierarchical Neural Networks for Image Interpretation

Information Technology Reference

In-Depth Information

Learning at one position did not help recognition at another position. These results

support the view that feature detectors are pooled in the visual system to produce

invariance. This invariance does not transfer to unfamiliar patterns for which no

specialized feature detectors exist.

Although the human visual system shows invariance to several transformations,

in the following, only invariance to translations is discussed to simplify the discus-

sion. Generalization to other transformations should be straightforward.

When implementing invariance to retinal position of a stimulus, one must not

forget that the retinal stimulus position depends on eye movements. Saccades and

smooth pursuit movements are able to center the object of interest at the fovea.

Thus, the neural circuitry has only to implement limited translational invariance for

the recognition of objects that are away from the fixation point.

In the Neural Abstraction Pyramid architecture, the degree of possible invariance

to translations increases with height. The reason for this is the fixed topographical

mapping of positions between the representations at different hierarchical levels.

Assuming that the resolution of the layers decreases by a factor of two for each step

in height, the following behavior can be observed: a shift of the original image by

eight pixels in Layer 0 corresponds to a shift of Layer 1 representations by four

cells. Representations in Layer 2 and Layer 3 are shifted by two cells and one cell,

respectively. Higher-level representations move only by fractions of the cell size.

Total invariance to translations is only possible at the top of the pyramid, where

the resolution drops to a single hypercolumn. For example, the average intensity

of an image could be represented there, as it is totally invariant to translation. This

feature is not computable in a single step using local connections only. However, it

can be computed by a hierarchy of local averaging and subsampling operations, as

in image pyramids (see Section 3.1.1).

The reduction in resolution alone does not ensure invariance to linear trans-

formations, because the higher-level representation may change significantly when

moved by sub-cell amounts. This aliasing effect is one of the most serious limi-

tations of orthogonal wavelet representations. The critical sampling of their basis

functions causes a redistribution of the signal's energy between the levels of the

representation. To avoid this effect, Simoncelli et al. [214] introduced the concept

of shiftability. Intermediate coefficients of a shiftable transformation can be written

as a weighted sum of the transform's coefficients, computed at a fixed number of

positions only. As a consequence, the sum of the energy of the coefficients does not

change when the signal is shifted. The price one must pay for shiftability is gener-

ally an increase of the sampling rate as determined by the Nyquist criterion [168],

e.g. to twice the critical rate.

The discrete Fourier transformation, discussed in Section 3.1.1, is shiftable

by design, but computes global features. When its sinusoidal basis functions are

weighted with Gaussian envelopes, Gabor functions are produced that are optimally

localized in space and in frequency. For the extraction of shift-invariant features, I

use a discrete approximation to Gabor filters.

Hierarchical Neural Networks for Image Interpretation

Search WWH ::

Custom Search

Home