Databases Reference
In-Depth Information
Some of the material presented in this chapter is not essential for understanding the techniques
described in this topic. However, to follow some of the literature in this area, familiarity
with these topics is necessary. We have marked these sections with a
. If you are primarily
interested in the techniques, you may wish to skip these sections, at least on first reading. On
the other hand, if you wish to delve more deeply into these topics, we have included a list of
resources at the end of this chapter that provide a more mathematically rigorous treatment of
this material.
When we were looking at lossless compression, one thing we never had to worry about
was how the reconstructed sequence would differ from the original sequence. By definition,
the reconstruction of a losslessly constructed sequence is identical to the original sequence.
However, there is only a limited amount of compression that can be obtained with lossless
compression. There is a floor (a hard one) defined by the entropy of the source, below which
we cannot drive the size of the compressed sequence. As long as we wish to preserve all of
the information in the source, the entropy, like the speed of light, is a fundamental limit.
The limited amount of compression available from using lossless compression schemes
may be acceptable in several circumstances. The storage or transmission resources available
to us may be sufficient to handle our data requirements after lossless compression. Or the
possible consequences of a loss of information may be much more expensive than the cost of
additional storage and/or transmission resources. This would be the case with the storage and
archiving of bank records; an error in the records could turn out to be much more expensive
than the cost of buying additional storage media.
If neither of these conditions hold—that is, resources are limited and we do not require
absolute integrity—we can improve the amount of compression by accepting a certain degree
of loss during the compression process. Performance measures are necessary to determine
the efficiency of our lossy compression schemes. For the lossless compression schemes, we
essentially used only the rate as the performance measure. That would not be feasible for lossy
compression. If rate were the only criterion for lossy compression schemes, where loss of
information is permitted, the best lossy compression scheme would be simply to throw away
all the data! Therefore, we need some additional performance measure, such as some measure
of the difference between the original and reconstructed data, which we will refer to as the
distortion in the reconstructed data. In the next section, we will look at some of the more
well-known measures of difference and discuss their advantages and shortcomings.
In the best of all possible worlds, we would like to incur the minimum amount of distor-
tion while compressing to the lowest rate possible. Obviously, there is a trade-off between
minimizing the rate and keeping the distortion small. The extreme cases are when we trans-
mit no information, in which case the rate is zero, or keep all the information, in which case
the distortion is zero. The rate for a discrete source is simply the entropy. The study of the
situations between these two extremes is called rate distortion theory . In this chapter we will
take a brief look at some important concepts related to this theory.
We will need to expand the dictionary of models available for our use, for several reasons.
First, because we are now able to introduce distortion, we need to determine how to add
distortion intelligently. For this, we often need to look at the sources somewhat differently
than we have done previously. Another reason is that we will be looking at compression
schemes for sources that are analog in nature, even though we have treated them as discrete
sources in the past. We need models that more precisely describe the true nature of these
Search WWH ::




Custom Search