Databases Reference
In-Depth Information
for the best match. The location of the anchor point is known to both the encoder and the
decoder. The location of the best content match is signaled to the decoder by encoding the
offset
of the context of this content from the anchor point. We have not specified what we
mean by “best” match. The coder takes the utilitarian approach that the best match is the one
that ends up providing the most compression. Thus, a longer match farther away from the
anchor may not be as advantageous as a shorter match closer to the anchor because of the
number of bits required to encode
δ
is also sent to the decoder.
The interesting aspect of this scheme is that it moves away from the idea of exactlymatching
the past. It provides a much richer environment and flexibility to enhance the compression and
will, hopefully, provide a fruitful avenue for further research.
δ
. The length of the match
λ
6.6 Dynamic Markov Compression
Quite often the probabilities of the value that the next symbol in a sequence takes on depend
not only on the current value but on past values as well. The ppm scheme relies on this
longer-range correlation. The ppm scheme, in some sense, reflects the application, that is, text
compression, for which it is most used. Dynamic Markov compression (DMC), introduced by
Cormack and Horspool [ 80 ], uses a more general framework to take advantage of relationships
and correlations, or contexts, that extend beyond a single symbol.
Consider the sequence of pixels in a scanned document. The sequence consists of runs
of black and white pixels. If we represent black by 0 and white by 1, we have runs of 0s
and 1s. If the current value is 0, the probability that the next value is 0 is higher than if the
current value was 1. The fact that we have two different sets of probabilities is reflected in
the two-state model shown in Figure 6.2 . Consider state A . The probability of the next value
being 1 changes depending on whether we reached state A from state B or from state A itself.
We can have the model reflect this by cloning state A , as shown in Figure 6.3 , to create state
A . Now if we see a white pixel after a run of black pixels, we go to state A . The probability
that the next value will be 1 is very high in this state. This way, when we estimate probabilities
for the next pixel value, we take into account not only the value of the current pixel but also
the value of the previous pixel.
This process can be continued as long as we wish to take into account longer and longer
histories. “As long as we wish” is a rather vague statement when it comes to implementing
0
1
A
B
0
1
F I GU R E 6 . 2
A two-state model for binary sequences.
Search WWH ::




Custom Search