Modeling and Synthesis of Realistic Visual Speech in 3D - 3D Modeling and Animation: Synthesis and Analysis Techniques for the Human Body

Game Development Reference

In-Depth Information

An extreme example where the model takes absolute preference is the mouth

cavity. The interior of the mouth is part of the model, which, e.g., contains the

skin connecting the teeth and the interior parts of the lips. Typically, scarcely any

3D data will be captured for this region, and those that are captured tend to be

of low quality. The upper row of teeth is fixed rigidly to the model and has already

received their position through the first step (the global transformation of the

model, possibly with a further adjustment by the user). The lower teeth follow

the jaw motion, which is determined as a rotation about the midpoint between the

points where the jaw is attached to the skull and a translation. The motion itself

is quantified by observing the motion of a point on the chin, standardized as

MPEG-4 point 2.10. These points have also been defined on the generic model,

as can be seen in Figure 9, and can be located automatically after the morph.

It has to be mentioned at this point that all the settings, like type and size of RBFs,

as well as whether vertices have to be cylindrically mapped or not, are defined

only once in the generic model as attributes of its vertices.

Viseme Prototype Extraction

The previous subsection described how a generic head model was deformed to

fit 3D snapshots. Not all frames were reconstructed, but only those that

represent the visemes (i.e., the most extreme mouth positions for the different

cases of Figure 2). About 80 frames were selected from the sequence for each

of the example faces. For the representation of the corresponding visemes, the

3D reconstructions, themselves, were not taken (the adapted generic heads), but

the difference of these heads with respect to the neutral one for the same person.

These deformation fields of all the different subjects still contain a lot of

redundancy. This was investigated by applying a Principal Component Analysis.

Over 98.5% of the variance in the deformation fields was found in the space

spanned by the 16 most dominant components. We have used this statistical

method not only to obtain a very compact description of the different shapes, but

also to get rid of small acquisition inaccuracies. The different instances of the

same viseme for the different subjects cluster in this space. The centroids of the

clusters were taken as the prototype visemes used to animate these faces later

on.

Face Animation

The section, Learning Viseme Expressions , describes an approach to extract a

set of visemes from a face that could be observed in 3D, while talking. This

Search WWH ::

Custom Search

Home