Graphics Reference
In-Depth Information
of facial motions which may facilitate semantic analysis, psychologists have
proposed Facial Action Coding System (FACS) [Ekman and Friesen, 1977].
FACS is based on anatomical studies on facial muscular activity and it enumer-
ates all Action Units (AUs) of a face that cause facial movements. Currently,
FACS is widely used as the underlying visual representation for facial motion
analysis, coding, and animation. The Action Units, however, lack quantita-
tive definition and temporal description. Therefore, computer scientists usually
need to decide their own definition in their computational models of AUs [Tao
and Huang, 1999]. Because of the high complexity of natural non-rigid facial
motion, these models usually need extensive manual adjustments to achieve
realistic results.
Recently, there have been considerable advances in motion capture technol-
ogy. It is now possible to collect large amount of real human motion data.
For example, the Motion Analysis TM system [MotionAnalysis, 2002] uses
multiple high speed cameras to track 3D movement of reflective markers. The
motion data can be used in movies, video game, industrial measurement, and
research in movement analysis. Because of the increasingly available motion
capture data, people begin to apply machine learning techniques to learn motion
model from the data. This type of models would capture the characteristics of
real human motion. One example is the linear subspace models of facial mo-
tion learned in [Kshirsagar et al., 2001, Hong et al., 2001b, Reveret and Essa,
2001]. In these models, arbitrary face deformation can be approximated by a
linear combination of the learn basis.
In this topic, we present our 3D facial deformation models derived from
motion capture data. Principal component analysis (PCA) [Jolliffe, 1986] is
applied to extract a few basis whose linear combinations explain the major vari-
ations in the motion capture data. We call these basis Motion Units (MUs), in a
similar spirit to AUs. Compared to AUs, MUs are derived automatically from
motion capture data such that it avoids the labor-intensive manual work for de-
signing AUs. Moreover, MUs has smaller reconstruction error than AUs when
linear combinations are used to approximate arbitrary facial shapes. Based on
MUs, we have developed a 3D non-rigid face tracking system. The subspace
spanned by MUs is used to constrain the noisy image motion estimation, such
as optical flow. As a result, the estimated non-rigid can be more robust. We
demonstrate the efficacy of the tracking system in model-based very low bit-rate
face video coding. The linear combinations of MUs can also be used to deform
3D face surface for face animations. In iFACE system, we have developed text-
driven face animation and speech-driven animations. Both of them use MUs
as the underlying representation of face deformation. One particular type of
animation is real-time speech-driven face animation, which is useful for real-
time two-way communications such as teleconferencing. We have used MUs
as the visual representation to learn a audio-to-visual mapping. The mapping
Search WWH ::

Custom Search