art in low bitrate video coding. The background will be coded as ordinary
video frames by H.264/JVT codec, and the foreground residuals are coded by
Intra _ 16 X 16 mode of the H.264/JVT coder. At receiver, the decoder syn-
thesizes the facial motion according to the received face motion parameters,
reconstructs the foreground and background regions, and recovers the fore-
ground facial area by summing up the synthesized face and transmitted fore-
ground residuals. Because most of the facial motion details are captured by
the facial motion parameters, the foreground residual tends to have small am-
plitude, we can choose to code the video with very lot bit rates without losing
much information of the foreground.
For the face tracker in the system, we used the geometric 3D face tracking
system described in Chapter 4. The face synthesizer uses the methods presented
in Chapter 5. In the current model-based face video coder, we ignore the face
texture variations in tracker and synthesizer. Instead, we let H.264/JVT coder
to deal with them. In the future, we plan to apply our flexible appearance model
to deal with these texture variations.
1.3 Results
The face tracker runs on a PC with two Pentium TM 42.2GHZ processors,
G-Force 4 video card, and 2G memories. With only one processor employed, the
tracking system can reach 25 frame per second (fps) in rigid tracking mode and
14 fps in non-rigid tracking mode. We capture and encode videos of 352 × 240
at 30Hz. At the same low bit-rate (18 ~ 19 kbits/s), we compare this hybrid
coding with H.26L JM 4.2 reference software. For a face video sequence with
147 frames, the performance comparisons of the two coders are presented in
Table 9.1. It can be observed that our face video coder has higher Peak Signal
to Noise Ratio (PSNR) for face area and is more computationally efficient.
Figure 9.1 shows three snapshots of a video with 147 frames. One important
result is that our face video coder results have much higher visual quality in
face area.
