Graphics Reference
In-Depth Information
Three parameters encode normal information, while the remaining three contain tangential
motion information. Then, on the basis of the estimated local motion parameters, the whole
mesh is then deformed by minimizing the sum of the three energy terms.
+ η 1
+ η 2 E r v f i
ζ 2
ζ 1 v f
i
2
2
v f
i
v f
i
2
.
(1.14)
i
The first data term measures the squared distance between the vertex position v f
i and the
position v i estimated by the local estimation process. The second uses the discrete Laplacian
operator of a local parameterization of the surface in v i to enforce smoothness. [The values
ζ 1 =
4 are used in all experiments (Furukawa and Ponce, 2009)]. This term
is very similar to the Laplacian regularizer used in many other algorithms (Ponce, 2008).
The third term is also used for regularization, and it enforces local tangential rigidity with
no stretch, shrink, or shear. The total energy is minimized with respect to the 3D positions
of all the vertices by a conjugate gradient method. In case of deformable surfaces such as
human faces, nonstatic target edge length is computed on the basis of non-rigid tangential
deformation from the reference frame to the current one at each vertex. The estimation of the
tangential deformation is performed at each frame before starting the motion estimation, and
the parameters are fixed within a frame. Thus, the tangential rigidity term E r ( v i ) for a vertex
v f
i
0
.
6 and
ζ 2 =
0
.
in the global mesh deformation is given by
max 0
2
e ij
e ij 2
,
τ
,
(1.15)
v j
N ( v i )
which is the sum of squared differences between the actual edge lengths and those predicted
from the reference frame to the current frame. The term
is used to make the penalty zero
when the deviation is small so that this regularization term is enforced only when the data term
is unreliable and the error is large. In all our experiments,
τ
2 times the average
edge length of the mesh at the first frame. Figure 1.8 shows some results of motion capture
approach proposed in Furukawa and Ponce (2009).
Finally after surface deformation, the residuals of the data and tangential terms are used
to filter out erroneous motion estimates. Concretely, these values are first smoothed, and a
smoothed local motion estimate is deemed an outlier if at least one of the two residuals exceeds
a given threshold. These three steps are iterated a couple of times to complete tracking in each
frame, the local motion estimation step only being applied to vertices whose parameters have
not already been estimated or filtered out.
The face capture framework proposed by Bradley et al. (2010) operates without use of
markers and consists of three main components: acquisition, multiview reconstruction and
geometry, and texture tracking. The acquisition stage uses 14 high definition video cameras
arranged in seven binocular stereo pairs. At the multiview reconstruction stage, each pair
captures a highly detailed small patch of the face surface under bright ambient light. This stage
uses on an iterative binocular stereo method to reconstruct seven surface patches independently
that are merged into a single high resolution mesh; the stereo algorithm is guided by face details
providing, roughly, 1 million polygons meshes. First, depth maps are created from pairs of
τ
is set to be 0
.
Search WWH ::




Custom Search