Introduction - Hierarchical Neural Networks for Image Interpretation

Information Technology Reference

In-Depth Information

Fig. 1.9. Feed-forward image processing chain (image adapted from [61]).

images of even higher resolution. If a high dynamic range is needed, logarithmic

image sensors need to be employed. For mobile applications, like cellular phones

and autonomous robots, CMOS sensors can be used. They are small, inexpensive,

and consume little power.

The more problematic part of computer vision is the interpretation of captured

images. This problem has two main aspects: speed and quality of interpretation.

Cameras and other image capture devices produce large amounts of data. Although

the processing speed and storage capabilities of computers increased tremendously

in the last decades, processing high-resolution images and video is still a challeng-

ing task for today's general-purpose computers. Limited computational power con-

strains image interpretation algorithms much more for mobile real-time applications

then for offline or desktop processing. Fortunately, the continuing hardware devel-

opment makes the prediction possible that these constraints will relax within the

next years, in the same way as the constraints for processing less demanding audio

signals relaxed already.

This may sound like one would only need to wait to see computers solve image

interpretation problems faster and better than humans do, but this is not the case.

While dedicated computer vision systems already outperform humans in terms of

processing speed, the interpretation quality does not reach human level. Current

computer vision systems are usually employed in very limited domains. Examples

include quality control, license plate identification, ZIP code reading for mail sort-

ing, and image registration in medical applications. All these systems include a pos-

sibility for the system to indicate lack of confidence, e.g. by rejecting ambiguous

examples. These are then inspected by human experts. Such partially automated

systems are useful though, because they free the experts from inspecting the vast

majority of unproblematic examples. The need to incorporate a human component

in such systems clearly underlines the superiority of the human visual system, even

for tasks in such limited domains.

Depending on the application, computer vision algorithms try to extract different

aspects of the information contained in an image or a video stream. For example,

one may desire to infer a structural object model from a sequence of images that

show a moving object. In this case, the object structure is preserved, while motion

information is discarded. On the other hand, for the control of mobile robots, anal-

ysis may start with a model of the environment in order to match it with the input

and to infer robot motion.

Two main approaches exist for the interpretation of images: bottom-up and top-

down. Figure 1.9 depicts the feed-forward image processing chain of bottom-up

Search WWH ::

Custom Search

Home