Techniques for Face Motion & Expression Analysis on Monocular Images - 3D Modeling and Animation: Synthesis and Analysis Techniques for the Human Body

Game Development Reference

In-Depth Information

algorithms initially designed to work for a frontal point of view under any other

head pose. The analysis algorithm parameters and variables are no longer

defined over the image plane in 2D, but over the realistic 3D head-model. This

solution controls face feature analysis during the change of the speaker's pose.

Although the system utilizes the clone of the speaker to analyze, the obtained

parameters are general enough to be synthesized on other models or avatars.

(See Figure 8.)

Methods that Use Explicit Face Synthesis During the

Image Analysis

Some face motion analysis techniques use the synthesized image of the head

model to control or to refine the analysis procedure. In general, the systems that

use synthesized feedback in their analysis need a very realistic head model of the

speaker, a high control of the synthesis and a knowledge of the conditions of the

face being recorded.

Li, Roivainen and Forchheimer (1993) presented one of the first works to use

resynthesized feedback. Using a 3D model — Candide — their approach is

characterized by a feedback loop connecting computer vision and computer

graphics. They prove that embedding synthesis techniques into the analysis

phase greatly improves the performance of motion estimation. A slightly

different solution is given by Ezzat and Poggio (1996a, 1996b). In their articles,

they describe image-based modeling techniques that make possible the creation

of photo-realistic computer models of real human faces. The model they use is

built using example views of the face, bypassing the need of any 3D computer

graphics. To generate the motion for this model, they use an analysis-by-

synthesis algorithm, which is capable of extracting a set of high-level parameters

from an image sequence involving facial movement using embedded image-

based models. The parameters of the models are perturbed in a local and

independent manner for each image until a correspondence-based error metric

is minimized. Their system is restricted to understand a limited number of

expressions.

More recent research works are able to develop much more realistic results with

three-dimensional models. Eisert and Girod (1998), for instance, present a

system that estimates 3D motion from image sequences showing head and

shoulder scenes for video telephone and teleconferencing applications. They use

a very realistic 3D head model of the person in the video. The model constrains

the motion and deformation in the face to a set of FAPs defined by the MPEG-

4 standard. Using the model, they obtain a description of both global (head pose)

and local 3D head motion as a function of unknown facial parameters. Combining

Search WWH ::

Custom Search

Home