Game Development Reference
In-Depth Information
A video camera captures images of the head-and-shoulder part of a person. The
encoder analyzes the frames and estimates 3-D motion and facial expressions
of the person using a 3-D head model. A set of facial animation parameters
(FAPs) is obtained that describes — together with the 3-D model — the current
appearance of the person. Only a few parameters have to be encoded and
transmitted, resulting in very low bit-rates. The head model has to be transmitted
only once if it has not already been stored at the decoder in a previous session.
At the decoder, the parameters are used to deform the head model according to
the person's facial expressions. The original video frame is finally approximated
by rendering the 3-D model at the new position.
The use of model-based coding techniques in communication scenarios leads to
extremely low bit-rates of only a few kbit/s for the transmission of head-and-
shoulder image sequences. This also enables video streaming over low-band-
width channels for mobile devices like PDAs or smart phones. The rendering
complexity is comparable to that of a hybrid video codec and, in experiments,
frame rates of 30 Hz have been achieved on an iPAQ PDA. On the other hand,
the intensive exploitation of a-priori knowledge restricts the applicability to
special scenes that can be described by 3-D models available at the decoder. In
a video-phone scenario, e.g., other objects like a hand in front of the face simply
do not show up unless explicitly modeled in the virtual scene. In order to come
up with a codec that is able to encode arbitrary scenes, hybrid coding techniques
can be incorporated, increasing bit-rate but assuring generality to unknown
objects. The model-aided codec is an example of such an approach (Eisert et al.,
2000). Model-based coding techniques, however, also offer additional features
besides low bit-rates, enabling many new applications that cannot be achieved
with traditional hybrid coding methods. In immersive video-conferencing (Kauff
et al., 2002), multiple participants who are located at different places can be
seated at a joint virtual table. Due to the 3-D representation of the objects, pose
modification for correct seating positions can easily be accomplished, as well as
view-point corrections according to the user's motion. By replacing the 3-D
model of one person by a different one, other people can be animated with the
expressions of an actor as shown in the next section. Similarly, avatars can be
driven to create user-friendly man-machine interfaces, where a human-like
character interacts with the user. Analyzing the user with a web cam also gives
the computer feedback about the user's emotions and intentions (Picard, 1997).
Other cues in the face can assist the computer-aided diagnosis and treatment of
patients in medical applications. For example, asymmetry in facial expressions
caused by facial palsy can be measured three-dimensionally (Frey et al., 1999)
or craniofacial syndromes can be detected by the 3-D analysis of facial feature
positions (Hammond et al., 2001). These examples indicate the wide variety of
applications for model-based facial analysis and synthesis techniques.
Search WWH ::




Custom Search