Three-Dimensional Data Acquisition - Computer Vision for Visual Effects

Graphics Reference

In-Depth Information

calculation in Equation ( 8.8 ) with one that depends simply on intensity ratios, and

Zhang and Yau [ 572 ] used two fringe images and a flat (projector-fully-on) image to

mitigate measurement errors and increase processing speed. Weise et al. [ 540 ] noted

that moving objects inevitably generate “ripple” artifacts in 3D since the assumption

that the same pixel location in all three images in Equation ( 8.7 ) corresponds to the

same scene point is incorrect. They proposed a method to estimate and compensate

for the underlying motion to remove the artifacts.

Phase unwrapping is a major challenge for fringe-projection methods, and there

is a vast literature on methods to solve the problem (e.g., see [ 166 ]). Luckily, in appli-

cations where real-time performance is required (e.g., real-time 3D measurement

of facial expressions), the surface generally changes sufficiently smoothly (except in

problematic regions like facial hair).

8.3

MULTI-VIEW STEREO

The final technology we'll discuss for obtaining detailed 3D measurements of an

object or scene is multi-view stereo (MVS) . This term covers a large class of methods

with the common theme that only a set of source images

from calibrated

cameras is used as the basis for the 3D estimation problem. In contrast to LiDAR and

structured light methods, multi-view stereo algorithms are passive , meaning that the

sensing technology doesn't interfere at all with the scene.

We can think of multi-view stereo as a combination of the material in Chapters 5

and 6 . That is, first a set of cameras (typically ten or more) is accurately cali-

brated, either using a calibration device or by matching features in a natural scene.

Then, region correspondence techniques are adapted from the stereo literature to

obtain dense correspondences between pairs of images and across sets of images.

Since the cameras are calibrated, triangulating these correspondences leads to 3D

measurements of scene points.

In this section we overview four general approaches to multi-view stereo. The first

set of volumetric methods represents the scene as a finely sampled set of colored

voxels, and selects a set of voxels whose shape and color is consistent with all the

images. The second set of surface deformation methods evolves a mesh or level-set

function to enclose thefinal set of 3Dpoints usingpartial-differential-equation-based

techniques. The third set of patch-based methods generates small 3D planar patches

in the scene by triangulating multi-image feature matches, and grows these patches

in 3D to account for as much of the scene as possible. Finally, the fourth set of depth

map fusion methods begins with dense depth maps obtained from stereo pairs and

tries to incrementally fuse them into a unified set of 3D points. 14

Seitz et al. [ 433 ] gave an important overview of the multi-view stereo literature

as of 2006, and contributed a carefully ground-truthed evaluation benchmark used

by most modern multi-view stereo researchers. This benchmark 15 catalyzed multi-

view stereo research in the same way that the previous Middlebury benchmarks

{

I 1 ,

...

, I M

}

14 Algorithms that only produce a sparse, irregular set of 3D points (e.g., the recovered 3D points

produced by a matchmoving algorithm) are not considered to be multi-view stereo algorithms.

15 Datasets and continually-updated results are available at http://vision.middlebury.edu/mview/ .

Computer Vision for Visual Effects

Search WWH ::

Custom Search

Home