Summary and Conclusions - Hierarchical Neural Networks for Image Interpretation

Information Technology Reference

In-Depth Information

posed by the creators of the database used in the experiments. The method is not

restricted to static images. It was shown that a face could be tracked in real time.

11.2 Conclusions

The successful application of the proposed image interpretation architecture to sev-

eral non-trivial computer vision tasks shows that the design patterns followed are

advantageous for those kinds of problems.

The architectural bias of the Neural Abstraction Pyramid facilitates learning of

image representations. The pyramidal networks utilize the two-dimensional nature

of images as well as their hierarchical structure. Because the same data structures

and algorithms are used in the lower layers of the pyramid and at its top, the interface

problem between high-level and low-level representations, characteristic for many

current computer vision systems, does not occur.

The use of weight sharing allows for reusing examples that are presented at

one location for the interpretation of other locations. While this is not biologically

plausible, it helps to limit the number of free parameters in the network and hence

facilitates generalization. Restricting the weights to mediate specific excitation and

unspecific inhibition constrains the representations used by the networks since it

enforces sparse features. A similar effect can be achieved with a low-activity prior.

The use of recurrence was motivated by the ubiquitous presence of feedback in

the human visual system and by the fact that an iterative solution to a problem is

frequently much easier to obtain than direct one. Recurrence allows for integration

of bottom-up, lateral, and top-down influences. If local ambiguities exist, the inter-

pretation decision can be deferred until contextual evidence arrives. This yields a

flexible use of context. Parts of the representation that are confident bias the inter-

pretation of less confident parts.

This iterative approach has anytime characteristics. Initial interpretation results

are available very early. If necessary, they are refined as the processing proceeds.

The advantages of such a strategy are most obvious in situations which are challeng-

ing for current computer vision systems. While the interpretation of unambiguous

stimuli requires no refinement, the iterative interpretation helps to resolve ambigui-

ties. Hence, the use of the Neural Abstraction Pyramid should be considered when

image contrast is low, noise is present, or objects are partially occluded. Further-

more, since the recurrent networks can integrate information over time, they are

suitable for the processing of input sequences, such as video streams.

The application of learning techniques to the proposed architecture shows a way

to overcome the problematic design complexity of current computer vision systems.

While application-specific feature extraction methods must be designed manually

when the task changes, supervised learning in the Neural Abstraction Pyramid of-

fers the possibility of specifying the task through a set of input/output examples.

Automatic optimization of all parts of the system is possible in order to produce the

desired results. In this way, a generic network becomes task-specific.

Search WWH ::

Custom Search

Home