Introduction - Introduction to Data Compression

Databases Reference

In-Depth Information

of the size of the original data. In this particular example, the compression ratio calculated in

this manner would be 75%.

Another way of reporting compression performance is to provide the average number of

bits required to represent a single sample. This is generally referred to as the rate . For example,

in the case of the compressed image described above, if we assume 8 bits per byte (or pixel),

the average number of bits per pixel in the compressed representation is 2. Thus, we would

say that the rate is 2 bits per pixel.

In lossy compression, the reconstruction differs from the original data. Therefore, in

order to determine the efficiency of a compression algorithm, we have to have some way of

quantifying the difference. The difference between the original and the reconstruction is often

called the distortion . (We will describe several measures of distortion in Chapter 8.) Lossy

techniques are generally used for the compression of data that originate as analog signals, such

as speech and video. In compression of speech and video, the final arbiter of quality is human.

Because human responses are difficult to model mathematically, many approximate measures

of distortion are used to determine the quality of the reconstructed waveforms. We will discuss

this topic in more detail in Chapter 8.

Other terms that are also used when talking about differences between the reconstruction

and the original are fidelity and quality . When we say that the fidelity or quality of a recon-

struction is high, we mean that the difference between the reconstruction and the original is

small. Whether this difference is a mathematical difference or a perceptual difference should

be evident from the context.

1.2 Modeling and Coding

While reconstruction requirements may force the decision of whether a compression scheme

is to be lossy or lossless, the exact compression scheme we use will depend on a number of

different factors. Some of the most important factors are the characteristics of the data that need

to be compressed. A compression technique that will workwell for the compression of text may

not work well for compressing images. Each application presents a different set of challenges.

There is a saying attributed toBobKnight, the former basketball coach at IndianaUniversity

and Texas TechUniversity: “If the only tool you have is a hammer, you approach every problem

as if it were a nail.” Our intention in this topic is to provide you with a large number of tools

that you can use to solve a particular data compression problem. It should be remembered that

data compression, if it is a science at all, is an experimental science. The approach that works

best for a particular application will depend to a large extent on the redundancies inherent in

the data.

The development of data compression algorithms for a variety of data can be divided

into two phases. The first phase is usually referred to as modeling . In this phase, we try to

extract information about any redundancy that exists in the data and describe the redundancy

in the form of a model. The second phase is called coding . A description of the model and

a “description” of how the data differ from the model are encoded, generally using a binary

alphabet. The difference between the data and the model is often referred to as the residual .

Search WWH ::

Custom Search

Home