The dynamics of facial motion is complex so that it is difficult to model
with dynamics equations. Data-driven model, such as Hidden Markov Model
(HMM) [Rabiner, 1989], provides an effective alternative. One example is
“voice puppetry” [Brand, 1999], where an HMM trained by entropy minimiza-
tion is used to model the dynamics of facial motion during speech. Then, the
HMM model is used to off-line generate a smooth facial deformation trajectory
given speech signal.
2. Motion Capture Database
To study the complex motion of face during speech and expression, we
need an extensive motion capture database. The database can be used to
learn facial motion models. Furthermore, it will benefit future study on bi-
modal speech perception, synthetic talking head development and evaluation
and etc. In our framework, we have experimented on both data collected using
Motion Analysis TM system, and the facial motion capture data provided by
Dr. Brian Guenter [Guenter et al., 1998] of Microsoft Research.
MotionAnalysis [MotionAnalysis, 2002] EvaRT 3.2 system is a marker-
based capture device, which can be used for capturing geometric facial defor-
mation. An example of the marker layout is shown in Figure 3.1. There are 44
markers on the face. Such marker-based capture devices have high temporal
An example of marker layout for MotionAnalysis system.
resolution (up to 300fps), however the spatial resolution is low (only tens of
markers on face are feasible). Appearance details due to facial deformation,
therefore, is handled using our flexible appearance model presented in chapter 6.
The Microsoft data, collected by by Guenter et al. [Guenter et al., 1998],
use 153 markers. Figure 3.2 shows an example of the markers. For better
visualization purpose, we build a mesh based on those markers, illustrated by
Figure 3.2 (b) and (c).