Information Technology Reference
In-Depth Information
Fig. 1.9. Feed-forward image processing chain (image adapted from [61]).
images of even higher resolution. If a high dynamic range is needed, logarithmic
image sensors need to be employed. For mobile applications, like cellular phones
and autonomous robots, CMOS sensors can be used. They are small, inexpensive,
and consume little power.
The more problematic part of computer vision is the interpretation of captured
images. This problem has two main aspects: speed and quality of interpretation.
Cameras and other image capture devices produce large amounts of data. Although
the processing speed and storage capabilities of computers increased tremendously
in the last decades, processing high-resolution images and video is still a challeng-
ing task for today's general-purpose computers. Limited computational power con-
strains image interpretation algorithms much more for mobile real-time applications
then for offline or desktop processing. Fortunately, the continuing hardware devel-
opment makes the prediction possible that these constraints will relax within the
next years, in the same way as the constraints for processing less demanding audio
signals relaxed already.
This may sound like one would only need to wait to see computers solve image
interpretation problems faster and better than humans do, but this is not the case.
While dedicated computer vision systems already outperform humans in terms of
processing speed, the interpretation quality does not reach human level. Current
computer vision systems are usually employed in very limited domains. Examples
include quality control, license plate identification, ZIP code reading for mail sort-
ing, and image registration in medical applications. All these systems include a pos-
sibility for the system to indicate lack of confidence, e.g. by rejecting ambiguous
examples. These are then inspected by human experts. Such partially automated
systems are useful though, because they free the experts from inspecting the vast
majority of unproblematic examples. The need to incorporate a human component
in such systems clearly underlines the superiority of the human visual system, even
for tasks in such limited domains.
Depending on the application, computer vision algorithms try to extract different
aspects of the information contained in an image or a video stream. For example,
one may desire to infer a structural object model from a sequence of images that
show a moving object. In this case, the object structure is preserved, while motion
information is discarded. On the other hand, for the control of mobile robots, anal-
ysis may start with a model of the environment in order to match it with the input
and to infer robot motion.
Two main approaches exist for the interpretation of images: bottom-up and top-
down. Figure 1.9 depicts the feed-forward image processing chain of bottom-up
Search WWH ::




Custom Search