GEOMETRIC FACIAL MOTION SYNTHESIS - 3D Face Processing: Modeling, Analysis and Synthesis

Graphics Reference

In-Depth Information

Figure 5.3. Four of the key shapes. The top row images are front views and the bottom row

images are the side views. The largest components of variances are (a): 0.67; (b): 1.0;, (c):

0.18; (d): 0.19.

using only speech signals requires a complicated continuous speech recognizer,

and the phoneme recognition rate and the timing information may not be ac-

curate enough. The text script associated with speech, however, provides the

accurate word-level transcription so that it greatly simplifies the complexity

and also improves the accuracy. We use a phoneme speech alignment tool,

which comes with Hidden Markov Model Toolkit (HTK) 2.0 [HTK, 2004] for

phoneme recognition and alignment. Speech stream is decoded into phoneme

sequence with timing information. Once we have the phoneme sequence and

the timing information, the remaining part of the procedure of the visual speech

synthesis is similar to text driven face animation.

5. Real-time Speech-driven Face Animation

Real-time speech-driven face animation is demanded for real-time two-way

communications such as teleconferencing. The key issue is to derive the real-

time audio-to-visual mapping. In Section 5.1, we describe a formant-analysis-

based feature for real-time speech animation. Then we present a real-time

audio-to-visual mapping based on artificial neural network (ANN) in Sec-

tion 5.2.

Search WWH ::

Custom Search

Home