GEOMETRIC FACIAL MOTION SYNTHESIS - 3D Face Processing: Modeling, Analysis and Synthesis

Graphics Reference

In-Depth Information

Figure 5.4. The architecture of offline speech driven talking face.

5.1 Formant features for real-time speech-driven face

animation

Multiple acoustic features are correlated to vocal tract shape. LPC features

are one of the most widely used features for speech driven animation [Brand,

1999, Curinga et al., 1996, Morishima and Yotsukura, 1999]. In this section,

we describe a technique using formant frequencies as acoustic features because

it is directly related to vowel-like sound including vowels, diphthongs and

semivowels. It is observed that the vowel sounds account for major shapes

of the mouth and make major contributions to the movement of mouth. Thus

formant analysis enables us to build a simple yet effective mapping for speech

driven animation [Wen et al., 2001].

5.1.1 Formant analysis

Human speech production system consists of two main components, the

vocal cords and the vocal tract. The vocal cords excitation serves as the source

of signal while vocal tract acts as a time-variant filter. The characteristics of

the two components decide the final output speech. In speech production, the

resonance frequencies of the vocal tract tube are called formant frequencies or

simply formants. The formant frequencies depend on the shape of vocal tract

and each shape is characterized by a set of formants [Rabiner and Shafer, 1978].

In practice, formant analysis is widely used to extract vocal tract characteristics

for speech analysis and synthesis. Many methods are available for formants

estimation. In our system, we use a method based on LPC parameters [Rabiner

Search WWH ::

Custom Search

Home