Databases Reference
In-Depth Information
what we would now call a videoconferencing system:
Imagine that we had at the receiver a sort of rubberymodel of the human face. Or we
might have a description of such a model stored in the memory of a huge electronic
computer
...
. Then, as the person before the transmitter talked, the transmitter
would have to follow the movements of his eyes, lips, and jaws, and other muscular
movements and transmit these so that the model at the receiver could do likewise.
Pierce's dream is a reasonably accurate description of a three-dimensional model-based
approach to the compression of facial image sequences. In this approach, a generic wireframe
model, such as the one shown in Figure 19.13 , is constructed using triangles. When encoding
the movements of a specific human face, the model is adjusted to the face by matching features
and the outer contour of the face. The image textures are then mapped onto this wireframe
model to synthesize the face. Once this model is available to both transmitter and receiver, only
changes in the face are transmitted to the receiver. These changes can be classified as global
motion or local motion [ 254 ]. Global motion involves movement of the head, while local
motion involves changes in the features—in other words, changes in facial expressions. The
globalmotion can bemodeled in terms ofmovements of rigid bodies. The facial expressions can
be represented in terms of relative movements of the vertices of the triangles in the wireframe
model. In practice, separating a movement into global and local components can be difficult
because most points on the face will be affected by both the changing position of the head and
the movement due to changes in facial expression. Different approaches have been proposed
to separate these effects [ 255 , 254 , 256 ].
The global movements can be described in terms of rotations and translations. The local
motions, or facial expressions, can be described as a sum of action units (AU), which are a
set of 44 descriptions of basic facial expressions [ 257 ]. For example, AU1 corresponds to the
raising of the inner brow and AU2 corresponds to the raising of the outer brow; therefore, AU1
+
AU2 would mean raising the brow.
Although the synthesis portion of this algorithm is relatively straightforward, the analysis
portion is far from simple. Detecting changes in features, which tend to be rather subtle, is a
very difficult task. There is a substantial amount of research in this area, and if this problem is
resolved, this approach promises rates comparable to the rates of the analysis/synthesis voice
coding schemes. A good starting point for exploring this fascinating area is [ 258 ].
19.7 Asymmetric Applications
There are a number of applications inwhich it is cost effective to shift more of the computational
burden to the encoder. For example, in multimedia applications where a video sequence is
stored on aCD-ROM, the decompressionwill be performedmany times and has to be performed
in real time. However, the compression is performed only once, and there is no need for it
to be in real time. Thus, the encoding algorithms can be significantly more complex. A
similar situation arises in broadcast applications, where for each transmitter there might be
thousands of receivers. In the following sections we will look at the standards developed for
such asymmetric applications.
Search WWH ::




Custom Search