Information Technology Reference
In-Depth Information
(a) (b) (c)
Fig. 1.13. Iterative image interpretation: (a) the image is interpreted first at positions where
little ambiguity exists; (b) lateral interactions reduce ambiguity; (c) top-down expansion of
abstract representations bias the low-level decision.
these representations decreases, while the diversity of features and their invariance
to transformations increase.
Iterative Refinement. The proposed architecture consists of simple processing el-
ements that interact with their neighbors. These interactions implement bottom-up
operations, like feature extraction, top-down operations, like feature expansion, and
lateral operations, like feature grouping.
The main idea is to interpret images iteratively, as illustrated in Figure 1.13.
While images frequently contain parts that are ambiguous, most image parts can be
interpreted relatively easy in a bottom-up manner. This produces partial represen-
tations in higher layers that can be completed using lateral interactions. Top-down
expansion can now bias the interpretation of the ambiguous stimuli.
This iterative refinement is a flexible way to incorporate context information.
When the interpretation cannot be decided locally, the decision is deferred, until
further evidence arrives from the context.
Adaptability and Learning. While current computer vision systems usually con-
tain adaptable components, such as trainable classifiers, most steps of the processing
chain are designed manually. Depending on the application, different preprocessing
steps are applied and different features are extracted. This makes it difficult to adapt
a computer vision system for a new task.
Neural networks are tools that have been successfully applied to machine learn-
ing tasks. I propose to use simple processing elements to maintain the hierarchy
of representations. This yields a large hierarchical neural network with local recur-
rent connectivity for which unsupervised and supervised learning techniques can be
applied.
While the architecture is biased for image interpretation tasks, e.g. by utilizing
the 2D nature and hierarchical structure of images, it is still general enough to be
adapted for different tasks. In this way, manual design is replaced by learning from
a set of examples.
Search WWH ::




Custom Search