GEOMETRIC FACIAL MOTION SYNTHESIS - 3D Face Processing: Modeling, Analysis and Synthesis

Graphics Reference

In-Depth Information

3. Text-driven Face Animation

When text is used in communication, e.g., in the context of text-based elec-

tronic chatting over the Internet or visual email, visual speech synthesized from

text will greatly help deliver information. Recent work on text driven face ani-

mation includes the work of Cohen and Massaro [Cohen and Massaro, 1993],

Ezzat and Poggio [Ezzat and Poggio, 2000], and Waters and Levergood [Waters

and Levergood, 1993].

Figure 5.2.

The architecture of text driven talking face.

Similar to the work of Ezzat and Poggio [Ezzat and Poggio, 2000] and that

of Waters and Levergood [Waters and Levergood, 1993], our framework adopts

the key frame based face animation technique for text-driven face animation.

The procedure of the text driven face animation is illustrated in Figure 5.2.

Our framework uses Microsoft Text-to-Speech (TTS) engine for text analysis

and speech synthesis. First, the text stream is fed into the TTS engine. TTS

parses the text and generates the corresponding phoneme sequence, the timing

information of phonemes, and the synthesized speech stream. Each phoneme

is mapped to a viseme based on a lookup table. Each viseme is a key frame.

Therefore, the text is translated in to a key frame sequence. A temporal trajec-

tory is then synthesized based on the key frame sequence using the technique

described in Section 2.

In the framework, we use a label system that has forty-four phonemes. Sev-

enteen viseme groups are design to group visually similar phonemes together.

The phonemes and their viseme group labels are shown in Table 5.1.

In our experiment, we use the motion capture data in Chapter 3 to train the

key shape model for each viseme. Each shape is represented using MUPs,

Search WWH ::

Custom Search

Home