Analysis and Synthesis of Facial Expressions - 3D Modeling and Animation: Synthesis and Analysis Techniques for the Human Body

Game Development Reference

In-Depth Information

A video camera captures images of the head-and-shoulder part of a person. The

encoder analyzes the frames and estimates 3-D motion and facial expressions

of the person using a 3-D head model. A set of facial animation parameters

(FAPs) is obtained that describes — together with the 3-D model — the current

appearance of the person. Only a few parameters have to be encoded and

transmitted, resulting in very low bit-rates. The head model has to be transmitted

only once if it has not already been stored at the decoder in a previous session.

At the decoder, the parameters are used to deform the head model according to

the person's facial expressions. The original video frame is finally approximated

by rendering the 3-D model at the new position.

The use of model-based coding techniques in communication scenarios leads to

extremely low bit-rates of only a few kbit/s for the transmission of head-and-

shoulder image sequences. This also enables video streaming over low-band-

width channels for mobile devices like PDAs or smart phones. The rendering

complexity is comparable to that of a hybrid video codec and, in experiments,

frame rates of 30 Hz have been achieved on an iPAQ PDA. On the other hand,

the intensive exploitation of a-priori knowledge restricts the applicability to

special scenes that can be described by 3-D models available at the decoder. In

a video-phone scenario, e.g., other objects like a hand in front of the face simply

do not show up unless explicitly modeled in the virtual scene. In order to come

up with a codec that is able to encode arbitrary scenes, hybrid coding techniques

can be incorporated, increasing bit-rate but assuring generality to unknown

objects. The model-aided codec is an example of such an approach (Eisert et al.,

2000). Model-based coding techniques, however, also offer additional features

besides low bit-rates, enabling many new applications that cannot be achieved

with traditional hybrid coding methods. In immersive video-conferencing (Kauff

et al., 2002), multiple participants who are located at different places can be

seated at a joint virtual table. Due to the 3-D representation of the objects, pose

modification for correct seating positions can easily be accomplished, as well as

view-point corrections according to the user's motion. By replacing the 3-D

model of one person by a different one, other people can be animated with the

expressions of an actor as shown in the next section. Similarly, avatars can be

driven to create user-friendly man-machine interfaces, where a human-like

character interacts with the user. Analyzing the user with a web cam also gives

the computer feedback about the user's emotions and intentions (Picard, 1997).

Other cues in the face can assist the computer-aided diagnosis and treatment of

patients in medical applications. For example, asymmetry in facial expressions

caused by facial palsy can be measured three-dimensionally (Frey et al., 1999)

or craniofacial syndromes can be detected by the 3-D analysis of facial feature

positions (Hammond et al., 2001). These examples indicate the wide variety of

applications for model-based facial analysis and synthesis techniques.

Search WWH ::

Custom Search

Home