Mathematical Preliminaries for Lossless Compression - Introduction to Data Compression

Databases Reference

In-Depth Information

With our stated assumptions, the entropy for this source is 3.25 bits. This means that the best

scheme we could find for coding this sequence could only code it at 3.25 bits/sample.

However, if we assume that there was sample-to-sample correlation between the samples

and we remove the correlation by taking differences of neighboring sample values, we arrive

at the residual sequence

111

−

1111

−

111111

−

111

This sequence is constructed using only two values with probabilities: P

(

) =

and

16 . The entropy in this case is 0.70 bits per symbol. Of course, knowing only

this sequence would not be enough for the receiver to reconstruct the original sequence. The

receiver must also know the process by which this sequence was generated from the original

sequence. The process depends on our assumptions about the structure of the sequence. These

assumptions are called the model for the sequence. In this case, the model for the sequence is

( −

) =

x n =

x n − 1 +

r n

where x n is the n th element of the original sequence and r n is the n th element of the residual

sequence. This model is called a static model because its parameters do not change with n .A

model whose parameters change or adapt with n to the changing characteristics of the data is

called an adaptive model.

We see that knowing something about the structure of the data can help to “reduce the

entropy.” We have put “reduce the entropy” in quotes because the entropy of the source is a

measure of the amount of information generated by the source. As long as the information

generated by the source is preserved (in whatever representation), the entropy remains the

same. What we are reducing is our estimate of the entropy. The “actual” structure of the data

in practice is generally unknowable, but anything we can learn about the data can help us to

estimate the actual source entropy. Theoretically, as seen in Equation ( 2 ), we accomplish this

in our definition of the entropy by picking larger and larger blocks of data to calculate the

probability over, letting the size of the block go to infinity.

Consider the following contrived sequence:

12123333123333123312

Obviously, there is some structure to this data. However, if we look at it one symbol at a

time, the structure is difficult to extract. Consider the probabilities: P

(

) =

(

) =

4 , and

2 . The entropy is 1.5 bits/symbol. This particular sequence consists of 20 symbols;

therefore, the total number of bits required to represent this sequence is 30. Now let's take

the same sequence and look at it in blocks of two. Obviously, there are only two symbols,

1 2, and 3 3. The probabilities are P

(

) =

2 , and the entropy is 1 bit/symbol.

As there are 10 such symbols in the sequence, we need a total of 10 bits to represent the

entire sequence—a reduction of a factor of three. The theory says we can always extract the

structure of the data by taking larger and larger block sizes; in practice, there are limitations

to this approach. To avoid these limitations, we try to obtain an accurate model for the data

and code the source with respect to the model. In Section 2.3 , we describe some of the models

commonly used in lossless compression algorithms. But before we do that, let's make a slight

detour and see a more rigorous development of the expression for average information. While

(

) =

2 ,

(

) =

Introduction to Data Compression

Search WWH ::

Custom Search

Home