Graphics Reference
In-Depth Information
accelerated research in stereo and optical flow. The two main evaluation datasets
are based on roughly constant-reflectance models approximately ten centimeters on
a side, captured from a hundreds of viewpoints distributed on a hemisphere (see
Figure 8.27 a). Strecha et al. [ 470 ] later contributed a benchmarking dataset for large-
scale multi-view stereo algorithms, using high-resolution images of buildings many
meters on a side.
It's important to note that whilemodernmulti-view stereo results are qualitatively
quite impressive, and quantitatively (i.e., sub-millimeter) accurate for small objects,
purely image-based techniques are not yet ready to replace LiDAR systems for highly
accurate, large-scale 3D data acquisition. For example, Strecha et al. [ 470 ] estimated
that for large outdoor scenes, only forty to sixty percent of the 3Dpoints for a topMVS
algorithm applied to high-resolution images were within three standard deviations
of the noise level of a LiDAR scanner, while ten to thirty percent of the ground truth
measurements were missing or wildly inaccurate in the MVS result. For this reason,
multi-view stereo papers typically use LiDAR or structured light results as the ground
truth for their algorithmcomparisons. Multi-view stereo algorithms can also be quite
computationally expensive and hence slow, another drawback compared to near-
real-time structured light systems.
8.3.1
Volumetric Methods
Volumetric methods for 3D data acquisition share similarities with the problem of
finding the visual hull from silhouettes of an object, as discussed in Section 7.7.3 .
As in the visual hull problem, we require a set of accurately calibrated cameras, and
represent 3D space as a set of occupied voxels. The finer the voxelization of the space
is, themore accurate the 3D reconstructionwill be. However, in themulti-view stereo
problem, we also use the colors of the pixels inside the object silhouette — not only
to color the voxels on the resulting 3D surface but also to remove voxels inconsistent
with the observed color in some source image. Therefore, the result of a volumetric
multi-view stereo method is usually a subset of the visual hull with colors associated
to each surface voxel, so that rendering the voxels from the point of view of each
camera should produce a similar image to what was actually acquired.
The basic voxel coloring idea originated in a paper by Seitz and Dyer [ 435 ]. In this
approach, a plane of voxels is swept through space along the direction of increasing
distance from the cameras. A special camera configuration — for example, that no
scene point is contained within the convex hull of the camera centers — is required.
Each voxel in the current plane is projected to the images, and the colors of the
corresponding image pixels are evaluated for consistency. If the colors are all suf-
ficiently similar (e.g., the standard deviation of the set of color measurements is
sufficiently small), the voxel is called photo-consistent , kept and colored; otherwise,
it is removed. 16 The photo-consistency idea is illustrated in Figure 8.24 . Voxels along
lines of sight “behind” (i.e., in a depth plane after) a colored voxel are not considered
in subsequent steps.
16 This method, and multi-view stereo methods in general, perform best on Lambertian surfaces, as
opposed to specular or translucent ones. Of course, the same is true for LiDAR and structured light
methods.
 
Search WWH ::




Custom Search