Databases Reference
In-Depth Information
lossy compression is to model the source output and send the model parameters to the source
instead of the estimates of the source output. The receiver tries to synthesize the source output
based on the received model parameters.
Consider an image transmission system that works as follows. At the transmitter, we have
a person who examines the image to be transmitted and comes up with a description of the
image. At the receiver, we have another person who then proceeds to create that image. For
example, suppose the image we wish to transmit is a picture of a field of sunflowers. Instead
of trying to send the picture, we simply send the words “field of sunflowers.” The person at
the receiver paints a picture of a field of sunflowers on a piece of paper and gives it to the
user. Thus, an image of an object is transmitted from the transmitter to the receiver in a highly
compressed form. This approach towards compression should be familiar to listeners of sports
broadcasts on radio. It requires that both transmitter and receiver work with the same model.
In terms of sports broadcasting, this means that the viewer has a mental picture of the sports
arena, and both the broadcaster and listener attach the same meaning to the same terminology.
This approach works for sports broadcasting because the source being modeled functions
under very restrictive rules. In a basketball game, when the referee calls a dribbling foul, listen-
ers generally don't picture a drooling chicken. If the source violates the rules, the reconstruction
would suffer. If the basketball players suddenly decided to put on a ballet performance, the
transmitter (sportscaster) would be hard pressed to represent the scene accurately to the re-
ceiver. Therefore, it seems that this approach to compression can only be used for artificial
activities that function according to man-made rules. Of the sources that we are interested in,
only text fits this description, and the rules that govern the generation of text are complex and
differ widely from language to language.
Fortunately, while natural sources may not follow man-made rules, they are subject to the
laws of physics, which can prove to be quite restrictive. This is particularly true of speech. No
matter what language is being spoken, the speech is generated using machinery that is not very
different from person to person. Moreover, this machinery has to obey certain physical laws
that substantially limit the behavior of outputs. Therefore, speech can be analyzed in terms
of a model, and the model parameters can be extracted and transmitted to the receiver. At
the receiver the speech can be synthesized using the model. This analysis/synthesis approach
was first employed by Homer Dudley at Bell Laboratories, who developed what is known as
the channel vocoder (described in the next section). Actually, the synthesis portion had been
attempted even earlier by Kempelen Farkas Lovag (1734-1804). He developed a “speaking
machine” inwhich the vocal tract wasmodeled by a flexible tubewhose shape could bemodified
by an operator. Sound was produced by forcing air through this tube using bellows [ 226 ].
Unlike speech, images are generated in a variety of different ways; therefore, the analy-
sis/synthesis approach does not seem very useful for image or video compression. However,
if we restrict the class of images to “talking heads” of the type we would encounter in a video-
conferencing situation, we might be able to satisfy the conditions required for this approach.
When we talk, our facial gestures are restricted by the way our faces are constructed and by
the physics of motion. This realization has led to the new field of model-based video coding
(see Chapter 19).
A totally different approach to image compression based on the properties of self-similarity
is the fractal coding approach. While this approach does not explicitly depend on some physical
Search WWH ::




Custom Search