Information Technology Reference
In-Depth Information
posed by the creators of the database used in the experiments. The method is not
restricted to static images. It was shown that a face could be tracked in real time.
11.2 Conclusions
The successful application of the proposed image interpretation architecture to sev-
eral non-trivial computer vision tasks shows that the design patterns followed are
advantageous for those kinds of problems.
The architectural bias of the Neural Abstraction Pyramid facilitates learning of
image representations. The pyramidal networks utilize the two-dimensional nature
of images as well as their hierarchical structure. Because the same data structures
and algorithms are used in the lower layers of the pyramid and at its top, the interface
problem between high-level and low-level representations, characteristic for many
current computer vision systems, does not occur.
The use of weight sharing allows for reusing examples that are presented at
one location for the interpretation of other locations. While this is not biologically
plausible, it helps to limit the number of free parameters in the network and hence
facilitates generalization. Restricting the weights to mediate specific excitation and
unspecific inhibition constrains the representations used by the networks since it
enforces sparse features. A similar effect can be achieved with a low-activity prior.
The use of recurrence was motivated by the ubiquitous presence of feedback in
the human visual system and by the fact that an iterative solution to a problem is
frequently much easier to obtain than direct one. Recurrence allows for integration
of bottom-up, lateral, and top-down influences. If local ambiguities exist, the inter-
pretation decision can be deferred until contextual evidence arrives. This yields a
flexible use of context. Parts of the representation that are confident bias the inter-
pretation of less confident parts.
This iterative approach has anytime characteristics. Initial interpretation results
are available very early. If necessary, they are refined as the processing proceeds.
The advantages of such a strategy are most obvious in situations which are challeng-
ing for current computer vision systems. While the interpretation of unambiguous
stimuli requires no refinement, the iterative interpretation helps to resolve ambigui-
ties. Hence, the use of the Neural Abstraction Pyramid should be considered when
image contrast is low, noise is present, or objects are partially occluded. Further-
more, since the recurrent networks can integrate information over time, they are
suitable for the processing of input sequences, such as video streams.
The application of learning techniques to the proposed architecture shows a way
to overcome the problematic design complexity of current computer vision systems.
While application-specific feature extraction methods must be designed manually
when the task changes, supervised learning in the Neural Abstraction Pyramid of-
fers the possibility of specifying the task through a set of input/output examples.
Automatic optimization of all parts of the system is possible in order to produce the
desired results. In this way, a generic network becomes task-specific.
Search WWH ::




Custom Search