Graphics Reference
In-Depth Information
from the video images is computed. Drift caused by inevitable optical flow error is detected
in the per-frame texture maps and corrected in the geometry. Also, the mapping is guided by
an edge-based mouth-tracking process to account the high speed motion while talking.
Beeler et al. (2011) extend their MVS face acquisition system, discussed in Section 1.3,
to facial motion capture. Their solution, as Bradley's solution, requires no makeup; the tem-
porally varying texture can be derived directly from the captured video. The computation is
parallelizable so that long sequences can be reconstructed efficiently using a multicore imple-
mentation. The high quality results derive from two innovations. The first is a robust tracking
algorithm specifically adapted for short sequences that integrates tracking in image space and
uses the integrated result to propagate a single reference mesh to each target frame. The second
is to address long sequences, and it employs the “anchor frame” concept. The latter is based on
the observation that a lengthy facial performance contains many frames similar in appearance.
One frame is defined as the reference frame. Other frames similar to the reference frame are
marked as anchor frames. Finally, the tracker computes the flow from the reference to each
anchor independently with a high level of measurement accuracy. The proposed framework
operates in five stages:
1. Stage 1: Computation of Initial Meshes - Each frame is processed independently to generate
a first estimate of the mesh.
2. Stage 2: Anchoring - The reference frame is manually identified. Similar frames to the
reference frame are detected automatically and labeled as anchor frames.
3. Stage 3: Image-Space Tracking - Image pixels are tracked from the reference frame to
anchor frames and then sequentially between non-anchor frames and the nearest anchor
frame.
4. Stage 4: Mesh Propagation - On the basis tracking results from the previous stage, a
reference mesh is propagated to all frames in the sequence.
5. Stage 5: Mesh Refinement - The initial propagation from Stage 4 is refined to enforce
consistency with the image data.
1.4.2 Photometric Stereo
Photometric stereo is a technique in computer vision for estimating the surface normals of
objects by observing that object under different lighting conditions. Estimation of face surface
normals can be achieved on the basis of photometric stereo assuming that the face is observed
under different lighting conditions. For instance, in three-source photometric stereo, three
images of the face are given, taken from the same viewpoint and illuminated by three light
sources. These light sources emit usually the same light spectrum from three non-coplanar
directions. If an orthographic camera model is assumed, the word coordinate system can be
aligned so that the xy plane coincides with the image plane. Z axis corresponds to the viewing
direction. Hence, the surface in front of the camera can be defined as the height Z ( x
,
y ). Now,
assuming that
z is the gradient of this function with respect to x and y , the vector locally
normal to the surface at ( x
,
y ) can be defined as
1
Z
=
1
.
(1.16)
n
1
2
+ |∇
Z
|
Search WWH ::




Custom Search