Three-Dimensional Data Acquisition - Computer Vision for Visual Effects

Graphics Reference

In-Depth Information

map can be estimated for each image by stereo matching based on reprojecting via

the model, similar to Figure 8.25 . This approach can be viewed as an early multi-view

stereo algorithm inwhich the 3Dpoints are constrained to lie on geometric primitives

interactively created by the user.

Two notable early multi-view stereo algorithms were proposed by Okutomi and

Kanade [ 354 ] and Collins [ 100 ]. Another approach to multi-view stereo not discussed

here is photometric stereo , in which the 3D shape of a shiny object (e.g., a ceramic

statue) is estimated by acquiring multiple images of it under different illumination

conditions (e.g., [ 196 , 521 ]). The changing intensity patterns provide clues about

the normal vector at each surface point. Nehab et al. [ 347 ] observed that normals

estimated from triangulation-based scanners could be improved by combining the

data with the output of photometric stereo techniques.

Two exciting avenues of research have recently been enabled by the confluence

of commercial 3D scanning technology, ample processing power and storage, and

massive internet photography databases. In one direction, the thousands of images

resulting from a keyword search on Flickr or Google Images can be viewed as the

input to a large multi-view stereo problem. Snavely et al. [ 464 ] described how to cali-

brate the cameras underlying such a collection based on correspondence estimation

and structure from motion, and how to then apply multi-view stereo techniques to

obtain a dense 3D reconstruction of the scene. In contrast to conventional multi-view

stereo techniques, this type of approach simply discards entire images that are of low

quality or for which the camera calibration is uncertain; indeed, a key component of

these large-scale algorithms is the careful choice of image sets that are likely to be

productive.

Another exciting direction is city-scale scanning , using a vehicle equipped with

some combination of cameras, laser rangefinders, and GPS/inertial navigation units

to help with its localization. Pollefeys et al. [ 367 ] described an impressive system that

generates textured3Dmesh reconstructions in real timeusing a vehiclemountedwith

eight cameras. They used an incremental depthmap fusion algorithm to process tens

of thousands of video frames into a single, consistent, and detailed 3D model. Alter-

nately, Früh and Zakhor [ 158 ] designed a vehicle equipped with a camera and two

laser rangefinders. One rangefinder acquired vertical 3D strips of building facades,

whichwere registered using images from the camera and horizontal 3Ddata from the

second rangefinder. The 3D datasets estimated from the vehicle were then refined

(e.g., to remove drift) based on registration to an aerial map of the scanned area.

This system registered thousands of images and 3D strips to produce an accurate

textured model of streets around the Berkeley campus. Subsequent work addressed

the problem of inpainting façade geometry and texture in LiDAR “shadows” caused

by foreground occlusions [ 156 ]. For large holes, a patch-based inpainting approach

inspired by the techniques in Section 3.4.2 might be more appropriate (e.g., [ 124 ]).

Finally, we mention that once 3D data has been acquired by any of the means

discussed in this chapter, several image understanding techniques can be applied

to it. 23 For example, Verma et al. [ 513 ] discussed how to detect and model buildings

23 “Image understanding” is used in a broad sense here; automatic analysis and understanding of 3D

data typically falls under the umbrella of computer vision, even if there weren't any conventional

images actually involved in the data collection.

Computer Vision for Visual Effects

Search WWH ::

Custom Search

Home