Databases Reference
In-Depth Information
With our stated assumptions, the entropy for this source is 3.25 bits. This means that the best
scheme we could find for coding this sequence could only code it at 3.25 bits/sample.
However, if we assume that there was sample-to-sample correlation between the samples
and we remove the correlation by taking differences of neighboring sample values, we arrive
at the residual sequence
111
1111
111111
111
13
16
This sequence is constructed using only two values with probabilities: P
(
1
) =
and
3
16 . The entropy in this case is 0.70 bits per symbol. Of course, knowing only
this sequence would not be enough for the receiver to reconstruct the original sequence. The
receiver must also know the process by which this sequence was generated from the original
sequence. The process depends on our assumptions about the structure of the sequence. These
assumptions are called the model for the sequence. In this case, the model for the sequence is
P
(
1
) =
x n =
x n 1 +
r n
where x n is the n th element of the original sequence and r n is the n th element of the residual
sequence. This model is called a static model because its parameters do not change with n .A
model whose parameters change or adapt with n to the changing characteristics of the data is
called an adaptive model.
We see that knowing something about the structure of the data can help to “reduce the
entropy.” We have put “reduce the entropy” in quotes because the entropy of the source is a
measure of the amount of information generated by the source. As long as the information
generated by the source is preserved (in whatever representation), the entropy remains the
same. What we are reducing is our estimate of the entropy. The “actual” structure of the data
in practice is generally unknowable, but anything we can learn about the data can help us to
estimate the actual source entropy. Theoretically, as seen in Equation ( 2 ), we accomplish this
in our definition of the entropy by picking larger and larger blocks of data to calculate the
probability over, letting the size of the block go to infinity.
Consider the following contrived sequence:
12123333123333123312
Obviously, there is some structure to this data. However, if we look at it one symbol at a
time, the structure is difficult to extract. Consider the probabilities: P
1
(
1
) =
P
(
2
) =
4 , and
1
2 . The entropy is 1.5 bits/symbol. This particular sequence consists of 20 symbols;
therefore, the total number of bits required to represent this sequence is 30. Now let's take
the same sequence and look at it in blocks of two. Obviously, there are only two symbols,
1 2, and 3 3. The probabilities are P
P
(
3
) =
1
1
2 , and the entropy is 1 bit/symbol.
As there are 10 such symbols in the sequence, we need a total of 10 bits to represent the
entire sequence—a reduction of a factor of three. The theory says we can always extract the
structure of the data by taking larger and larger block sizes; in practice, there are limitations
to this approach. To avoid these limitations, we try to obtain an accurate model for the data
and code the source with respect to the model. In Section 2.3 , we describe some of the models
commonly used in lossless compression algorithms. But before we do that, let's make a slight
detour and see a more rigorous development of the expression for average information. While
(
12
) =
2 ,
P
(
33
) =
 
Search WWH ::




Custom Search