Information Technology Reference
In-Depth Information
A local connection structure is sufficient for image interpretation tasks. Fea-
ture cells at corresponding positions in adjacent layers communicate via reciprocal
forward and backward projections. The vertical connections mediate between the
layers and capture the correlations between complex features and their correspond-
ing lower-complexity subfeatures. Because the image window that is covered by
a hyper-neighborhood increases with height in the pyramid, lateral interaction be-
tween distant image parts is possible in the higher layers of the pyramid. While
lateral projections in the lower layers of the network capture correlations of nearby
low-level features, lateral connections in higher layers capture correlations of far-
apart abstract image features.
If correlations between far-apart low-level features are important, they must be
mediated through a hierarchy of abstract features for the intermediate positions.
This is efficient since it involves only Θ (log D ) steps if the cell-distance between
the low-level features is D .
Such detailed long-distance correlations are frequently not important. This is
indicated by the fact that the human visual system is often unable to detect long-
distance correlations of low-level stimuli. One example for this is how difficult it is
to detect a marginal difference between two similar images that are presented side-
by-side. In contrast, when both images are overlaid, the differences are very salient,
since the corresponding low-level features are now close together.
4.1.4 Iterative Refinement
The Neural Abstraction Pyramid has been designed for the iterative interpretation
of images. The refinement of initial image interpretations through local recurrent
vertical and horizontal interactions of simple processing elements in a hierarchy is
the central idea of the architecture.
Such a refinement is needed to resolve ambiguities. In natural images, local
ambiguities are common. For example, the contrast between an object's surface and
the background may be very low at parts of the object's boundary. Occlusions may
hide other object parts. Non-homogeneous lighting and object transformations, like
scaling and rotation, are further sources of ambiguity. To recover the 3D structure
of objects from 2D images is an inherently ambiguous problem.
The human visual system resolves such ambiguities fast and reliably. It does so
by focusing on those features which are most reliable in a certain situation and by
the flexible use of context information. This is exactly what the iterative image inter-
pretation does. The interpretation of ambiguous stimuli is postponed until reliably
detected features are available as context. Horizontal and vertical feedback loops al-
low contextual influences between neighboring image locations and between repre-
sentations in adjacent layers, respectively. Information flow is asymmetric: reliable
features bias the unreliable ones. This can happen in any direction. Lateral neighbors
have the same reliability a-priory. Only the current stimulus decides which locations
cannot be interpreted without contextual bias. The bottom-up flow of information
is most common, since the function of the ventral visual pathway is to recognize
Search WWH ::




Custom Search