Game Development Reference
In-Depth Information
algorithms initially designed to work for a frontal point of view under any other
head pose. The analysis algorithm parameters and variables are no longer
defined over the image plane in 2D, but over the realistic 3D head-model. This
solution controls face feature analysis during the change of the speaker's pose.
Although the system utilizes the clone of the speaker to analyze, the obtained
parameters are general enough to be synthesized on other models or avatars.
(See Figure 8.)
Methods that Use Explicit Face Synthesis During the
Image Analysis
Some face motion analysis techniques use the synthesized image of the head
model to control or to refine the analysis procedure. In general, the systems that
use synthesized feedback in their analysis need a very realistic head model of the
speaker, a high control of the synthesis and a knowledge of the conditions of the
face being recorded.
Li, Roivainen and Forchheimer (1993) presented one of the first works to use
resynthesized feedback. Using a 3D model — Candide — their approach is
characterized by a feedback loop connecting computer vision and computer
graphics. They prove that embedding synthesis techniques into the analysis
phase greatly improves the performance of motion estimation. A slightly
different solution is given by Ezzat and Poggio (1996a, 1996b). In their articles,
they describe image-based modeling techniques that make possible the creation
of photo-realistic computer models of real human faces. The model they use is
built using example views of the face, bypassing the need of any 3D computer
graphics. To generate the motion for this model, they use an analysis-by-
synthesis algorithm, which is capable of extracting a set of high-level parameters
from an image sequence involving facial movement using embedded image-
based models. The parameters of the models are perturbed in a local and
independent manner for each image until a correspondence-based error metric
is minimized. Their system is restricted to understand a limited number of
expressions.
More recent research works are able to develop much more realistic results with
three-dimensional models. Eisert and Girod (1998), for instance, present a
system that estimates 3D motion from image sequences showing head and
shoulder scenes for video telephone and teleconferencing applications. They use
a very realistic 3D head model of the person in the video. The model constrains
the motion and deformation in the face to a set of FAPs defined by the MPEG-
4 standard. Using the model, they obtain a description of both global (head pose)
and local 3D head motion as a function of unknown facial parameters. Combining
Search WWH ::




Custom Search