Information Technology Reference
In-Depth Information
Learning at one position did not help recognition at another position. These results
support the view that feature detectors are pooled in the visual system to produce
invariance. This invariance does not transfer to unfamiliar patterns for which no
specialized feature detectors exist.
Although the human visual system shows invariance to several transformations,
in the following, only invariance to translations is discussed to simplify the discus-
sion. Generalization to other transformations should be straightforward.
When implementing invariance to retinal position of a stimulus, one must not
forget that the retinal stimulus position depends on eye movements. Saccades and
smooth pursuit movements are able to center the object of interest at the fovea.
Thus, the neural circuitry has only to implement limited translational invariance for
the recognition of objects that are away from the fixation point.
In the Neural Abstraction Pyramid architecture, the degree of possible invariance
to translations increases with height. The reason for this is the fixed topographical
mapping of positions between the representations at different hierarchical levels.
Assuming that the resolution of the layers decreases by a factor of two for each step
in height, the following behavior can be observed: a shift of the original image by
eight pixels in Layer 0 corresponds to a shift of Layer 1 representations by four
cells. Representations in Layer 2 and Layer 3 are shifted by two cells and one cell,
respectively. Higher-level representations move only by fractions of the cell size.
Total invariance to translations is only possible at the top of the pyramid, where
the resolution drops to a single hypercolumn. For example, the average intensity
of an image could be represented there, as it is totally invariant to translation. This
feature is not computable in a single step using local connections only. However, it
can be computed by a hierarchy of local averaging and subsampling operations, as
in image pyramids (see Section 3.1.1).
The reduction in resolution alone does not ensure invariance to linear trans-
formations, because the higher-level representation may change significantly when
moved by sub-cell amounts. This aliasing effect is one of the most serious limi-
tations of orthogonal wavelet representations. The critical sampling of their basis
functions causes a redistribution of the signal's energy between the levels of the
representation. To avoid this effect, Simoncelli et al. [214] introduced the concept
of shiftability. Intermediate coefficients of a shiftable transformation can be written
as a weighted sum of the transform's coefficients, computed at a fixed number of
positions only. As a consequence, the sum of the energy of the coefficients does not
change when the signal is shifted. The price one must pay for shiftability is gener-
ally an increase of the sampling rate as determined by the Nyquist criterion [168],
e.g. to twice the critical rate.
The discrete Fourier transformation, discussed in Section 3.1.1, is shiftable
by design, but computes global features. When its sinusoidal basis functions are
weighted with Gaussian envelopes, Gabor functions are produced that are optimally
localized in space and in frequency. For the extraction of shift-invariant features, I
use a discrete approximation to Gabor filters.
Search WWH ::




Custom Search