Graphics Reference
In-Depth Information
In the current system, we use the holistic MUs derived in Section 3 of Chap-
ter 3. Parts-based MUs could be used if a certain local region is the focus of
interests, such as the lips in speech reading. The system is implemented in a
PC with two 2.2 GHz Pentium 4 processors and 2GB memory. The image
size of the input video is 640 × 480. With only one CPU employed, the sys-
tem works at 14 frame/second for non-rigid face tracking. The tracking results
can be visualized by using the coefficients of MUs, R and to directly ani-
mate face models. Figure 4.1 shows some typical frames that it tracked, along
with the animated face model to visualize the results. It can be observed that
compared with neutral face (the first column images), the mouth opening (the
second column), subtle mouth rounding and mouth protruding (the third and
fourth columns) are captured in the tracking results visualized by animated face
3. Applications of Geometric 3D Face Tracking
The facial motion synthesis using tracking results can be used in model-
based face video coding such as [Huang and Tao, 2001, Tu et al., 2003]. In
our face video coding experiments [Tu et al., 2003], we track and encode the
face area using model-based coding. To encode the residual in face area and
the background for which a-priori knowledge is not generally available, we
use traditional waveform-based coding method H.26L. This hybrid approach
improves the robustness of the model-based method at the expense of increasing
bit-rate. Eisert et al. [Eisert et al., 2000] proposed a similar hybrid coding
technique using a different model-based 3D facial motion tracking approach.
We capture and code videos of 352 × 240 at 30Hz. At the same low bit-rate (18
kbits/s), we compare this hybrid coding with H.26L JM 4.2 reference software.
In Chapter 9, Figure 4.2 shows three snapshots of a video that we used in our
experiment. This video has 147 frames. For the video used in our experiments,
the Peak Signal to Noise Ratio (PSNR) around facial area for hybrid coding is
2dB higher compared to H.26L. Moreover, the hybrid coding results have much
higher visual quality. Because our tracking system works in real-time, it could
be used in a real-time low bit-rate video phone application. More details of the
model-based face video coding application will be discussed in Chapter 9.
Besides low bit-rate video coding, the tracking results can used as the visual
features for audio-visual speaker independent speech recognition [Zhang et al.,
2000], and emotion recognition [Cohen et al., 2003]. The bimodal speech
recognition system improves the speech recognition rate in noisy environments.
The emotion recognition system is being used to monitor students' emotional
and cognitive states in a computer-aided instruction application. In medical
applications related to facial motion disorder such as facial paralysis, visual cues
are important for both diagnosis and treatment. Therefore, the facial motion
analysis method can be used as a diagnostic tool such as in [Wachtman et al.,
Search WWH ::

Custom Search