3D Face Modeling - 3D Face Modeling, Analysis and Recognition

Graphics Reference

In-Depth Information

a surface from this volume. A simple example of this approach is the voxel-coloring algo-

rithm and its variants (Seitz and Dyer, 1997; Treuille et al., 2004). The second category of

approaches, based on voxels, level sets, and surface meshes, works by iteratively evolving a

surface to decrease or minimize a cost function. For example, from an initial volume, space

carving progressively removes inconsistent voxels. Other approaches represent the object as

an evolving mesh (Hernandez and Schmitt, 2004; Yu et al., 2006) moving as a function of

internal and external forces. In the third category are image-space methods that estimate a set

of depth maps. To ensure a single consistent 3D object interpretation, they enforce consistency

constraints between depth maps (Kolmogorov and Zabih, 2002; Gargallo and Sturm, 2005) or

merge the set of depth maps into a 3D object as a post process (Narayanan et al., 1998). The

final category groups approaches that first extract and matches a set of feature points. A surface

is then fitted to the reconstructed features (Morris and Kanade, 2000; Taylor, 2003). Seitz et al.

(2006) propose an excellent overview and categorization of MVS. 3D face reconstruction

approaches use a combination of methods from these categories.

Furukawa and Ponce (2009) proposed a MVS algorithm that outputs accurate models with a

fine surface. It implements multiview stereopsis as a simple match, expand, and filter procedure.

In the matching step, a set of features localized by Harris operator and difference-of-Gaussians

algorithms are matched across multiple views, giving a sparse set of patches associated with

salient image regions. From these initial matches, the two next steps are repeated n times

( n

3 in experiments). In the expansion step, initial matches are spread to nearby pixels to

obtain a dense set of patches. Finally in the filtering step, the visibility constraints are used to

discard incorrect matches lying either in front of or behind the observed surface. The MVS

approach proposed by Bradley et al. (2010) is based on an iterative binocular stereo method

to reconstruct seven surface patches independently and to merge into a single high resolution

mesh. At this stage, face details and surface texture help guide the stereo algorithm. First,

depth maps are created from pairs of adjacent rectified viewpoints. Then the most prominent

distortions between the views are compensated by a scaled-window matching technique. The

resulted depth images are converted to 3D points and fused into a single dense point cloud. A

triangular mesh from the initial point cloud is reconstructed over three steps: down-sampling,

outliers removal, and triangle meshing. Sample reconstruction results of this approach are

shown in Figure 1.7.

The 3D face acquisition approach proposed by Beeler et al. (2010), which is built on the

survey paper, takes inspiration from Furukawa and Ponce (2010). The main difference lies in

a refinement formulation. The starting point is the established approach for refining recovered

3D data on the basis of a data-driven photo-consistency term and a surface-smoothing term,

which has been research topic. These approaches differ in the use of a second-order anisotropic

formulation of the smoothing term, and we argue that it is particularly suited to faces. Camera

calibration is achieved in a pre-processing stage.

The run-time system starts with a pyramidal pairwise stereo matching. Results from lower

resolutions guide the matching at higher-resolutions. The face is first segmented based on

cues of background subtraction and skin color. Images from each camera pair are rectified. An

image pyramid is then generated by factor of two downsampling using Gaussian convolution

and stopping at approximately 150

=

150 pixels for the lowest layer. Then a dense matching is

established between pairwise neighboring cameras, and each layer of the pyramid is processed

as follows: Matches are computed for all pixels on the basis of normalized cross correlation

(NCC) over a square window (3

×

3). The disparity is computed to sub-pixel accuracy and

used to constrain the search area in the following layer. For each pixel, smoothness, uniqueness,

×

3D Face Modeling, Analysis and Recognition

Search WWH ::

Custom Search

Home