Neural Abstraction Pyramid Architecture - Hierarchical Neural Networks for Image Interpretation

Information Technology Reference

In-Depth Information

A local connection structure is sufficient for image interpretation tasks. Fea-

ture cells at corresponding positions in adjacent layers communicate via reciprocal

forward and backward projections. The vertical connections mediate between the

layers and capture the correlations between complex features and their correspond-

ing lower-complexity subfeatures. Because the image window that is covered by

a hyper-neighborhood increases with height in the pyramid, lateral interaction be-

tween distant image parts is possible in the higher layers of the pyramid. While

lateral projections in the lower layers of the network capture correlations of nearby

low-level features, lateral connections in higher layers capture correlations of far-

apart abstract image features.

If correlations between far-apart low-level features are important, they must be

mediated through a hierarchy of abstract features for the intermediate positions.

This is efficient since it involves only Θ (log D ) steps if the cell-distance between

the low-level features is D .

Such detailed long-distance correlations are frequently not important. This is

indicated by the fact that the human visual system is often unable to detect long-

distance correlations of low-level stimuli. One example for this is how difficult it is

to detect a marginal difference between two similar images that are presented side-

by-side. In contrast, when both images are overlaid, the differences are very salient,

since the corresponding low-level features are now close together.

4.1.4 Iterative Refinement

The Neural Abstraction Pyramid has been designed for the iterative interpretation

of images. The refinement of initial image interpretations through local recurrent

vertical and horizontal interactions of simple processing elements in a hierarchy is

the central idea of the architecture.

Such a refinement is needed to resolve ambiguities. In natural images, local

ambiguities are common. For example, the contrast between an object's surface and

the background may be very low at parts of the object's boundary. Occlusions may

hide other object parts. Non-homogeneous lighting and object transformations, like

scaling and rotation, are further sources of ambiguity. To recover the 3D structure

of objects from 2D images is an inherently ambiguous problem.

The human visual system resolves such ambiguities fast and reliably. It does so

by focusing on those features which are most reliable in a certain situation and by

the flexible use of context information. This is exactly what the iterative image inter-

pretation does. The interpretation of ambiguous stimuli is postponed until reliably

detected features are available as context. Horizontal and vertical feedback loops al-

low contextual influences between neighboring image locations and between repre-

sentations in adjacent layers, respectively. Information flow is asymmetric: reliable

features bias the unreliable ones. This can happen in any direction. Lateral neighbors

have the same reliability a-priory. Only the current stimulus decides which locations

cannot be interpreted without contextual bias. The bottom-up flow of information

is most common, since the function of the ventral visual pathway is to recognize

Search WWH ::

Custom Search

Home