Context-Based Compression - Introduction to Data Compression

Databases Reference

In-Depth Information

for the best match. The location of the anchor point is known to both the encoder and the

decoder. The location of the best content match is signaled to the decoder by encoding the

offset

of the context of this content from the anchor point. We have not specified what we

mean by “best” match. The coder takes the utilitarian approach that the best match is the one

that ends up providing the most compression. Thus, a longer match farther away from the

anchor may not be as advantageous as a shorter match closer to the anchor because of the

number of bits required to encode

δ

is also sent to the decoder.

The interesting aspect of this scheme is that it moves away from the idea of exactlymatching

the past. It provides a much richer environment and flexibility to enhance the compression and

will, hopefully, provide a fruitful avenue for further research.

δ

. The length of the match

λ

6.6 Dynamic Markov Compression

Quite often the probabilities of the value that the next symbol in a sequence takes on depend

not only on the current value but on past values as well. The ppm scheme relies on this

longer-range correlation. The ppm scheme, in some sense, reflects the application, that is, text

compression, for which it is most used. Dynamic Markov compression (DMC), introduced by

Cormack and Horspool [ 80 ], uses a more general framework to take advantage of relationships

and correlations, or contexts, that extend beyond a single symbol.

Consider the sequence of pixels in a scanned document. The sequence consists of runs

of black and white pixels. If we represent black by 0 and white by 1, we have runs of 0s

and 1s. If the current value is 0, the probability that the next value is 0 is higher than if the

current value was 1. The fact that we have two different sets of probabilities is reflected in

the two-state model shown in Figure 6.2 . Consider state A . The probability of the next value

being 1 changes depending on whether we reached state A from state B or from state A itself.

We can have the model reflect this by cloning state A , as shown in Figure 6.3 , to create state

A . Now if we see a white pixel after a run of black pixels, we go to state A . The probability

that the next value will be 1 is very high in this state. This way, when we estimate probabilities

for the next pixel value, we take into account not only the value of the current pixel but also

the value of the previous pixel.

This process can be continued as long as we wish to take into account longer and longer

histories. “As long as we wish” is a rather vague statement when it comes to implementing

0

1

A

B

0

1

F I GU R E 6 . 2

A two-state model for binary sequences.

Introduction to Data Compression

Search WWH ::

Custom Search

Home