Game Development Reference
In-Depth Information
The second and third experiments are designed to compare the influence of a
synthetic face on bimodal human emotion perception and that of the real face.
In the second experiment, the subjects are asked to infer the emotional state
while observing the synthetic talking face and listening to the audio tracks. The
third experiment is the same as the second one except that the subjects observe
the real face, instead. In each of the experiments, the audio-visual stimuli are
presented in two groups. In the first group, audio content and visual information
represent the same kind of information (e.g., positive text with smiling expres-
sion). In the second group, the relationship is the opposite. The results are
combined in Table 2.
We can see the face movements and the content of the audio tracks jointly
influence the decisions of the subjects. If the audio content and the facial
expressions represent the same kind of information, the human perception of the
information is enhanced. For example, when the associated facial expression of
the positive-text-content audio track is smiling, nearly all subjects say that the
emotional state is happy (see Table 2). The numbers of the subjects who perceive
a happy emotional state are higher than those using only one stimulus alone (see
Table 1). However, it confuses human subjects if the facial expressions and the
audio tracks represent opposite information. An example is shown in the fifth and
sixth columns of Table 2. The audio content conveys positive information, while
the facial expression is sad. Ten subjects report sad emotion if the synthetic
talking face with a sad expression is shown. The number increases to 12 if the
real face is used. This difference shows that the subjects tend to trust the real
face more than the synthetic face when the visual information conflicts with the
audio information. Overall, the experiments show that our real-time, speech-
driven synthetic talking face successfully affects human emotion perception.
The effectiveness of the synthetic face is comparable with that of the real face,
even though it is slightly weaker.
Conclusions
This chapter presents a unified framework for learning compact facial deforma-
tion models from data and applying the models to facial motion analysis and
synthesis. This framework uses a 3D facial motion capture database to learn
compact holistic and parts-based facial deformation models called MUs. The
MUs are used to approximate arbitrary facial deformation. The learned models
are used in robust 3D facial motion analysis and real-time, speech-driven face
animation. The experiments demonstrate that robust non-rigid face tracking and
flexible, natural face animation can be achieved based on the learned models. In
the future, we plan to investigate systematic ways of adapting learned models for
Search WWH ::




Custom Search