Graphics Reference
In-Depth Information
map can be estimated for each image by stereo matching based on reprojecting via
the model, similar to Figure 8.25 . This approach can be viewed as an early multi-view
stereo algorithm inwhich the 3Dpoints are constrained to lie on geometric primitives
interactively created by the user.
Two notable early multi-view stereo algorithms were proposed by Okutomi and
Kanade [ 354 ] and Collins [ 100 ]. Another approach to multi-view stereo not discussed
here is photometric stereo , in which the 3D shape of a shiny object (e.g., a ceramic
statue) is estimated by acquiring multiple images of it under different illumination
conditions (e.g., [ 196 , 521 ]). The changing intensity patterns provide clues about
the normal vector at each surface point. Nehab et al. [ 347 ] observed that normals
estimated from triangulation-based scanners could be improved by combining the
data with the output of photometric stereo techniques.
Two exciting avenues of research have recently been enabled by the confluence
of commercial 3D scanning technology, ample processing power and storage, and
massive internet photography databases. In one direction, the thousands of images
resulting from a keyword search on Flickr or Google Images can be viewed as the
input to a large multi-view stereo problem. Snavely et al. [ 464 ] described how to cali-
brate the cameras underlying such a collection based on correspondence estimation
and structure from motion, and how to then apply multi-view stereo techniques to
obtain a dense 3D reconstruction of the scene. In contrast to conventional multi-view
stereo techniques, this type of approach simply discards entire images that are of low
quality or for which the camera calibration is uncertain; indeed, a key component of
these large-scale algorithms is the careful choice of image sets that are likely to be
productive.
Another exciting direction is city-scale scanning , using a vehicle equipped with
some combination of cameras, laser rangefinders, and GPS/inertial navigation units
to help with its localization. Pollefeys et al. [ 367 ] described an impressive system that
generates textured3Dmesh reconstructions in real timeusing a vehiclemountedwith
eight cameras. They used an incremental depthmap fusion algorithm to process tens
of thousands of video frames into a single, consistent, and detailed 3D model. Alter-
nately, Früh and Zakhor [ 158 ] designed a vehicle equipped with a camera and two
laser rangefinders. One rangefinder acquired vertical 3D strips of building facades,
whichwere registered using images from the camera and horizontal 3Ddata from the
second rangefinder. The 3D datasets estimated from the vehicle were then refined
(e.g., to remove drift) based on registration to an aerial map of the scanned area.
This system registered thousands of images and 3D strips to produce an accurate
textured model of streets around the Berkeley campus. Subsequent work addressed
the problem of inpainting façade geometry and texture in LiDAR “shadows” caused
by foreground occlusions [ 156 ]. For large holes, a patch-based inpainting approach
inspired by the techniques in Section 3.4.2 might be more appropriate (e.g., [ 124 ]).
Finally, we mention that once 3D data has been acquired by any of the means
discussed in this chapter, several image understanding techniques can be applied
to it. 23 For example, Verma et al. [ 513 ] discussed how to detect and model buildings
23 “Image understanding” is used in a broad sense here; automatic analysis and understanding of 3D
data typically falls under the umbrella of computer vision, even if there weren't any conventional
images actually involved in the data collection.
 
Search WWH ::




Custom Search