Graphics Reference
In-Depth Information
a surface from this volume. A simple example of this approach is the voxel-coloring algo-
rithm and its variants (Seitz and Dyer, 1997; Treuille et al., 2004). The second category of
approaches, based on voxels, level sets, and surface meshes, works by iteratively evolving a
surface to decrease or minimize a cost function. For example, from an initial volume, space
carving progressively removes inconsistent voxels. Other approaches represent the object as
an evolving mesh (Hernandez and Schmitt, 2004; Yu et al., 2006) moving as a function of
internal and external forces. In the third category are image-space methods that estimate a set
of depth maps. To ensure a single consistent 3D object interpretation, they enforce consistency
constraints between depth maps (Kolmogorov and Zabih, 2002; Gargallo and Sturm, 2005) or
merge the set of depth maps into a 3D object as a post process (Narayanan et al., 1998). The
final category groups approaches that first extract and matches a set of feature points. A surface
is then fitted to the reconstructed features (Morris and Kanade, 2000; Taylor, 2003). Seitz et al.
(2006) propose an excellent overview and categorization of MVS. 3D face reconstruction
approaches use a combination of methods from these categories.
Furukawa and Ponce (2009) proposed a MVS algorithm that outputs accurate models with a
fine surface. It implements multiview stereopsis as a simple match, expand, and filter procedure.
In the matching step, a set of features localized by Harris operator and difference-of-Gaussians
algorithms are matched across multiple views, giving a sparse set of patches associated with
salient image regions. From these initial matches, the two next steps are repeated n times
( n
3 in experiments). In the expansion step, initial matches are spread to nearby pixels to
obtain a dense set of patches. Finally in the filtering step, the visibility constraints are used to
discard incorrect matches lying either in front of or behind the observed surface. The MVS
approach proposed by Bradley et al. (2010) is based on an iterative binocular stereo method
to reconstruct seven surface patches independently and to merge into a single high resolution
mesh. At this stage, face details and surface texture help guide the stereo algorithm. First,
depth maps are created from pairs of adjacent rectified viewpoints. Then the most prominent
distortions between the views are compensated by a scaled-window matching technique. The
resulted depth images are converted to 3D points and fused into a single dense point cloud. A
triangular mesh from the initial point cloud is reconstructed over three steps: down-sampling,
outliers removal, and triangle meshing. Sample reconstruction results of this approach are
shown in Figure 1.7.
The 3D face acquisition approach proposed by Beeler et al. (2010), which is built on the
survey paper, takes inspiration from Furukawa and Ponce (2010). The main difference lies in
a refinement formulation. The starting point is the established approach for refining recovered
3D data on the basis of a data-driven photo-consistency term and a surface-smoothing term,
which has been research topic. These approaches differ in the use of a second-order anisotropic
formulation of the smoothing term, and we argue that it is particularly suited to faces. Camera
calibration is achieved in a pre-processing stage.
The run-time system starts with a pyramidal pairwise stereo matching. Results from lower
resolutions guide the matching at higher-resolutions. The face is first segmented based on
cues of background subtraction and skin color. Images from each camera pair are rectified. An
image pyramid is then generated by factor of two downsampling using Gaussian convolution
and stopping at approximately 150
=
150 pixels for the lowest layer. Then a dense matching is
established between pairwise neighboring cameras, and each layer of the pyramid is processed
as follows: Matches are computed for all pixels on the basis of normalized cross correlation
(NCC) over a square window (3
×
3). The disparity is computed to sub-pixel accuracy and
used to constrain the search area in the following layer. For each pixel, smoothness, uniqueness,
×
Search WWH ::




Custom Search