GEOMETRIC MODEL-BASED 3D FACE TRACKING - 3D Face Processing: Modeling, Analysis and Synthesis

Graphics Reference

In-Depth Information

In the current system, we use the holistic MUs derived in Section 3 of Chap-

ter 3. Parts-based MUs could be used if a certain local region is the focus of

interests, such as the lips in speech reading. The system is implemented in a

PC with two 2.2 GHz Pentium 4 processors and 2GB memory. The image

size of the input video is 640 × 480. With only one CPU employed, the sys-

tem works at 14 frame/second for non-rigid face tracking. The tracking results

can be visualized by using the coefficients of MUs, R and to directly ani-

mate face models. Figure 4.1 shows some typical frames that it tracked, along

with the animated face model to visualize the results. It can be observed that

compared with neutral face (the first column images), the mouth opening (the

second column), subtle mouth rounding and mouth protruding (the third and

fourth columns) are captured in the tracking results visualized by animated face

models.

3. Applications of Geometric 3D Face Tracking

The facial motion synthesis using tracking results can be used in model-

based face video coding such as [Huang and Tao, 2001, Tu et al., 2003]. In

our face video coding experiments [Tu et al., 2003], we track and encode the

face area using model-based coding. To encode the residual in face area and

the background for which a-priori knowledge is not generally available, we

use traditional waveform-based coding method H.26L. This hybrid approach

improves the robustness of the model-based method at the expense of increasing

bit-rate. Eisert et al. [Eisert et al., 2000] proposed a similar hybrid coding

technique using a different model-based 3D facial motion tracking approach.

We capture and code videos of 352 × 240 at 30Hz. At the same low bit-rate (18

kbits/s), we compare this hybrid coding with H.26L JM 4.2 reference software.

In Chapter 9, Figure 4.2 shows three snapshots of a video that we used in our

experiment. This video has 147 frames. For the video used in our experiments,

the Peak Signal to Noise Ratio (PSNR) around facial area for hybrid coding is

2dB higher compared to H.26L. Moreover, the hybrid coding results have much

higher visual quality. Because our tracking system works in real-time, it could

be used in a real-time low bit-rate video phone application. More details of the

model-based face video coding application will be discussed in Chapter 9.

Besides low bit-rate video coding, the tracking results can used as the visual

features for audio-visual speaker independent speech recognition [Zhang et al.,

2000], and emotion recognition [Cohen et al., 2003]. The bimodal speech

recognition system improves the speech recognition rate in noisy environments.

The emotion recognition system is being used to monitor students' emotional

and cognitive states in a computer-aided instruction application. In medical

applications related to facial motion disorder such as facial paralysis, visual cues

are important for both diagnosis and treatment. Therefore, the facial motion

analysis method can be used as a diagnostic tool such as in [Wachtman et al.,

Search WWH ::

Custom Search

Home