Graphics Reference
In-Depth Information
3. Text-driven Face Animation
When text is used in communication, e.g., in the context of text-based elec-
tronic chatting over the Internet or visual email, visual speech synthesized from
text will greatly help deliver information. Recent work on text driven face ani-
mation includes the work of Cohen and Massaro [Cohen and Massaro, 1993],
Ezzat and Poggio [Ezzat and Poggio, 2000], and Waters and Levergood [Waters
and Levergood, 1993].
Figure 5.2.
The architecture of text driven talking face.
Similar to the work of Ezzat and Poggio [Ezzat and Poggio, 2000] and that
of Waters and Levergood [Waters and Levergood, 1993], our framework adopts
the key frame based face animation technique for text-driven face animation.
The procedure of the text driven face animation is illustrated in Figure 5.2.
Our framework uses Microsoft Text-to-Speech (TTS) engine for text analysis
and speech synthesis. First, the text stream is fed into the TTS engine. TTS
parses the text and generates the corresponding phoneme sequence, the timing
information of phonemes, and the synthesized speech stream. Each phoneme
is mapped to a viseme based on a lookup table. Each viseme is a key frame.
Therefore, the text is translated in to a key frame sequence. A temporal trajec-
tory is then synthesized based on the key frame sequence using the technique
described in Section 2.
In the framework, we use a label system that has forty-four phonemes. Sev-
enteen viseme groups are design to group visually similar phonemes together.
The phonemes and their viseme group labels are shown in Table 5.1.
In our experiment, we use the motion capture data in Chapter 3 to train the
key shape model for each viseme. Each shape is represented using MUPs,
Search WWH ::




Custom Search