GEOMETRIC FACIAL MOTION SYNTHESIS - 3D Face Processing: Modeling, Analysis and Synthesis

Graphics Reference

In-Depth Information

approaches using only one neural network for all audio features [Lavagetto‚

1995‚ Massaro and et al.‚ 1999]‚ our local ANN mapping (i.e. one neural net-

work for each audio feature cluster) is more efficient because each ANN is much

simpler. Therefore it can be trained with much less effort for a certain set of

training data. More generally‚ speech driven animation can be used in speech

and language eduction [Cole et al.‚ 1999]‚ speech understanding aid for noisy

environment and hard-of-hearing people‚ rehabilitation tool for facial motion

disorders treatment.

5.2.4 Human emotion perception study

The synthetic talking face‚ which is used to convey visual cues to human‚ can

be evaluated by human perception study. Here‚ we describe our experiments

which compare the influence of the synthetic talking face on human emotion

perception with that of the real face. We did similar experiments for 2D MU-

based speech driven animation [Hong et al.‚ 2002]. The experimental results can

help the user with how to use the synthetic talking face to deliver the intended

visual information.

We videotape a speaking subject who is asked to calmly read three sentences

with 3 facial expressions: (1) neutral‚ (2) smile‚ and (3) sad‚ respectively.

Hence‚ the audio tracks do not convey any emotional information. The con-

tents of the three sentence are: (1) “It is normal.”; (2) “It is good.”; and (3) “It is

bad.”. The associated information is: (1) neutral; (2) positive; and (3) negative.

The audio tracks are used to generate three sets of face animation sequences.

All three audio tracks are used in each set of animation sequence. The first set

is generated without expression. The second set is generated with smile expres-

sion. The third set is generated with sad expression. The facial deformation

due to speech and expression is linearly combined in our experiments. Sixteen

untrained human subjects‚ who never used our system before‚ participate the

experiments.

The first experiment investigates human emotion perception based on either

the visual stimuli alone or the audio stimuli alone. The subjects are first asked to

recognize the expressions of both the real face and the synthetic talking face and

infer their emotional states based on the animation sequences without audio.

All subjects correctly recognized the expressions of both the synthetic face and

the real face. Therefore‚ our synthetic talking face is capable to accurately

deliver facial expression information. The emotional inference results in terms

of the number of the subjects are shown in Table 5.2. The “S” columns

in Table 5.2‚ as well as in Table 5.4‚ 5.5‚ and 5.6‚ show the results using the

synthetic talking face. The “R” columns show the results using the real face.

As shown‚ the effectiveness of the synthetic talking face is comparable with that

of the real face. The subjects are then asked to listen to the audio and decide

the emotional state of the speaker. Each subject listens to each audio only once.

Search WWH ::

Custom Search

Home