presented in [Basu et al., 1998, Reveret and Essa, 2001] only deal with lips.
The trained 3D model is able to encode the information of real lip deformations.
Principal component analysis are used in [Hong et al., 2001b, Kshirsagar et al.,
2001, Reveret and Essa, 2001] to derive basis of facial deformation model.
Then the basis can be used for face animation [Hong et al., 2001b, Kshirsagar
et al., 2001, Reveret and Essa, 2001] and tracking [Hong et al., 2001b].
2. Geometric MU-based 3D Face Tracking
In the conventional 3D non-rigid face tracking algorithm using FACS-based
3D facial deformation model, the subspace spanned by the Action Units (AUs)
is used as the high-level knowledge to guide face tracking. Similar to MU,
AUs are defined such that arbitrary facial deformation is approximated by a
linear combination of AUs. However, the AUs are usually manually designed.
For these approaches, our automatically learned MUs can be used in place of
the manually designed AUs. In this way, extensive manual intervention can be
avoided, and natural facial deformation can be approximated better.
We choose to use the learned MUs in the 3D non-rigid face tracking system
proposed in [Tao and Huang, 1999], because the system has been shown to
be: (1) robust in face of gradual background changes; (2) able to recover from
temporary loss of tracking; and (3) real-time in tracking speed. For these
reasons, this tracking system has been effectively used for bimodal speech
recognition [Zhang et al., 2000] and emotion recognition [Cohen et al., 2003].
The facial motion observed in image plane can be represented by
where M is the projection matrix, is the neutral face, defines the
non-rigid deformation, R is the 3D rotation decided by three rotation angles
and stands for 3D translation. L is an N × M matrix that
contains M AUs, each of which is an N dimensional vector,
is the coefficients of the AUs. To estimate facial motion parameters
from 2D inter-frame motion the derivative of equation 4.1 is taken with re-
spect to Then, a linear equation between and
can be derived by ignoring high order derivatives (see details in [Tao and Huang,
1999]). The system estimates using template-matching-based optical
flow. After that, the linear system is solved using least square in a multi-
resolution manner for efficiency and robustness.
In the original system, L was manually designed using Bezier volume, and
represented by the displacements of vertices of face surface mesh. The design
process was labor-intensive. To derive L from the learned MUs in our cur-
rent system, the “MU fitting” process described in Chapter 3 is used. For the
adaptation, it requires that the face be in its neutral position in the first image