Graphics Reference
In-Depth Information
calculation in Equation ( 8.8 ) with one that depends simply on intensity ratios, and
Zhang and Yau [ 572 ] used two fringe images and a flat (projector-fully-on) image to
mitigate measurement errors and increase processing speed. Weise et al. [ 540 ] noted
that moving objects inevitably generate “ripple” artifacts in 3D since the assumption
that the same pixel location in all three images in Equation ( 8.7 ) corresponds to the
same scene point is incorrect. They proposed a method to estimate and compensate
for the underlying motion to remove the artifacts.
Phase unwrapping is a major challenge for fringe-projection methods, and there
is a vast literature on methods to solve the problem (e.g., see [ 166 ]). Luckily, in appli-
cations where real-time performance is required (e.g., real-time 3D measurement
of facial expressions), the surface generally changes sufficiently smoothly (except in
problematic regions like facial hair).
8.3
MULTI-VIEW STEREO
The final technology we'll discuss for obtaining detailed 3D measurements of an
object or scene is multi-view stereo (MVS) . This term covers a large class of methods
with the common theme that only a set of source images
from calibrated
cameras is used as the basis for the 3D estimation problem. In contrast to LiDAR and
structured light methods, multi-view stereo algorithms are passive , meaning that the
sensing technology doesn't interfere at all with the scene.
We can think of multi-view stereo as a combination of the material in Chapters 5
and 6 . That is, first a set of cameras (typically ten or more) is accurately cali-
brated, either using a calibration device or by matching features in a natural scene.
Then, region correspondence techniques are adapted from the stereo literature to
obtain dense correspondences between pairs of images and across sets of images.
Since the cameras are calibrated, triangulating these correspondences leads to 3D
measurements of scene points.
In this section we overview four general approaches to multi-view stereo. The first
set of volumetric methods represents the scene as a finely sampled set of colored
voxels, and selects a set of voxels whose shape and color is consistent with all the
images. The second set of surface deformation methods evolves a mesh or level-set
function to enclose thefinal set of 3Dpoints usingpartial-differential-equation-based
techniques. The third set of patch-based methods generates small 3D planar patches
in the scene by triangulating multi-image feature matches, and grows these patches
in 3D to account for as much of the scene as possible. Finally, the fourth set of depth
map fusion methods begins with dense depth maps obtained from stereo pairs and
tries to incrementally fuse them into a unified set of 3D points. 14
Seitz et al. [ 433 ] gave an important overview of the multi-view stereo literature
as of 2006, and contributed a carefully ground-truthed evaluation benchmark used
by most modern multi-view stereo researchers. This benchmark 15 catalyzed multi-
view stereo research in the same way that the previous Middlebury benchmarks
{
I 1 ,
...
, I M
}
14 Algorithms that only produce a sparse, irregular set of 3D points (e.g., the recovered 3D points
produced by a matchmoving algorithm) are not considered to be multi-view stereo algorithms.
15 Datasets and continually-updated results are available at http://vision.middlebury.edu/mview/ .
 
Search WWH ::




Custom Search