Graphics Reference
In-Depth Information
5.1 Introduction
In recent years, with thewidespread diffusion of 3D sensors, there has been increasing
interest in consumer and research applications based on dense range data. Some of
these sensors provide a depth map and an RGB (or monochrome) image of the sensed
scene and, for this reason, they are often referred to as RGBD (RGB plus Depth)
sensors. A well-known and representative example of such devices is the Microsoft
Kinect, a cheap and accurate RGBD sensor based on structured light technology.
Since its presentation in 2010, it has been deployed in many scientific and consumer
applications. This technology, developed by Prime Sense, relies on a standard color
camera, an infrared projector, and an infrared camera. The projected pattern is sensed
by the infrared camera and analyzed according to a patented technology in order to
infer depth. The Kinect enables the user to obtain accurate depth maps and images
at VGA resolution in indoor environments. Another interesting technology that in
recent years gained popularity is Time of Flight (ToF). In this case, the sensor emits
a modulated light and, by measuring the time required to receive the bounced light,
it infers depth. In most cases, this technology also provides a monochrome image of
the sensed scene and hence belongs to the class of RGBD sensors. However, com-
pared to the Kinect technology, ToF currently provides depth maps and images at a
reduced resolution compared to structured light sensors as well as to stereo vision
based sensors. Nevertheless, Microsoft recently presented for its new gaming con-
sole an evolution of the original Kinect based on time-of-flight technology enabling
increased resolution compared to other time-of-flight sensors currently available.
Active technologies have specific strengths and limitations [ 25 ]; however, they are
ill-suited to environments flooded with sunlight (the Kinect in particular becomes
useless in these circumstances). On the other hand, it is worth observing that, in
stereo vision technology, depth and image resolutions are only constrained by the
computational requirements of the stereo matching algorithm. For these reasons,
especially for the limitations concerned with ToF sensors, there have been attempts
to improve resolution and effectiveness of active sensors by means of sensor fusion
techniques (e.g., [ 6 ]). These approaches combine the depth maps provided by active
sensors with registered images and depth maps provided by high-resolution stereo
vision systems.
Stereo vision is a well-known technology for inferring depth and, excluding
projection-based approaches, it is a passive technology based on standard imaging
sensors. Stereo vision systems infer dense depth maps by identifying corresponding
projections of the same 3Dpoint sensed by two or more cameras in different positions.
This challenging task, often referred to as the correspondence problem , can be tack-
led with many algorithms (the Middlebury stereo evaluation website [ 28 ] provides a
quite updated list and evaluation of stereo vision algorithms) and consequently pro-
duces different outcomes in terms of accuracy and computational requirements. This
means that, in stereo vision, the algorithm aimed at tackling the correspondence prob-
lem plays a major role in the overall technology and, in recent years, there has been
a dramatic improvement in this area. Another important factor that has made stereo
Search WWH ::




Custom Search