Video Compression - Introduction to Data Compression

Databases Reference

In-Depth Information

what we would now call a videoconferencing system:

Imagine that we had at the receiver a sort of rubberymodel of the human face. Or we

might have a description of such a model stored in the memory of a huge electronic

computer

...

. Then, as the person before the transmitter talked, the transmitter

would have to follow the movements of his eyes, lips, and jaws, and other muscular

movements and transmit these so that the model at the receiver could do likewise.

Pierce's dream is a reasonably accurate description of a three-dimensional model-based

approach to the compression of facial image sequences. In this approach, a generic wireframe

model, such as the one shown in Figure 19.13 , is constructed using triangles. When encoding

the movements of a specific human face, the model is adjusted to the face by matching features

and the outer contour of the face. The image textures are then mapped onto this wireframe

model to synthesize the face. Once this model is available to both transmitter and receiver, only

changes in the face are transmitted to the receiver. These changes can be classified as global

motion or local motion [ 254 ]. Global motion involves movement of the head, while local

motion involves changes in the features—in other words, changes in facial expressions. The

globalmotion can bemodeled in terms ofmovements of rigid bodies. The facial expressions can

be represented in terms of relative movements of the vertices of the triangles in the wireframe

model. In practice, separating a movement into global and local components can be difficult

because most points on the face will be affected by both the changing position of the head and

the movement due to changes in facial expression. Different approaches have been proposed

to separate these effects [ 255 , 254 , 256 ].

The global movements can be described in terms of rotations and translations. The local

motions, or facial expressions, can be described as a sum of action units (AU), which are a

set of 44 descriptions of basic facial expressions [ 257 ]. For example, AU1 corresponds to the

raising of the inner brow and AU2 corresponds to the raising of the outer brow; therefore, AU1

+

AU2 would mean raising the brow.

Although the synthesis portion of this algorithm is relatively straightforward, the analysis

portion is far from simple. Detecting changes in features, which tend to be rather subtle, is a

very difficult task. There is a substantial amount of research in this area, and if this problem is

resolved, this approach promises rates comparable to the rates of the analysis/synthesis voice

coding schemes. A good starting point for exploring this fascinating area is [ 258 ].

19.7 Asymmetric Applications

There are a number of applications inwhich it is cost effective to shift more of the computational

burden to the encoder. For example, in multimedia applications where a video sequence is

stored on aCD-ROM, the decompressionwill be performedmany times and has to be performed

in real time. However, the compression is performed only once, and there is no need for it

to be in real time. Thus, the encoding algorithms can be significantly more complex. A

similar situation arises in broadcast applications, where for each transmitter there might be

thousands of receivers. In the following sections we will look at the standards developed for

such asymmetric applications.

Introduction to Data Compression

Search WWH ::

Custom Search

Home