Context-Based Compression - Introduction to Data Compression - page 173

Databases Reference

In-Depth Information

The first element of L is s , which gets encoded as a 4. We then move s to the top of the list,

which gives us

0

1

2

3

4

5

s

b

e

h

i

t

The next s is encoded as 0. Because s is already at the top of the list, we do not need to make

any changes. The next letter is h , which we encode as 3. We then move h to the top of the list:

0

1

2

3

4

5

h

s

b

e

i

t

The next letter is t , which gets encoded as 5. Moving t to the top of the list, we get

0

1

2

3

4

5

t

h

s

b

e

i

The next letter is also a t , so that gets encoded as a 0.

Continuing in this fashion, we get the sequence

40350135015

As we warned, the results are not too impressive with this small sequence, but we can see

how we would get large numbers of 0s and small values if the sequence to be encoded was

longer. These long sequences of 0s could then be efficiently encoded using schemes specifically

designed for encoding long repetitions of the same symbol, such as run-length coding , which

is described in the next chapter.

6.5 Associative Coder of Buyanovsky (ACB)

A different approach to using context for compression is employed by the eponymous com-

pression utility developed by George Buyanovsky. The details of this very efficient coder

are not well known; however, the way the context is used is interesting and we will briefly

describe this aspect of ACB. More detailed descriptions are available in [ 78 , 79 ]. The ACB

coder develops a sorted dictionary of all encountered contexts. In this it is similar to other

context-based encoders. However, it also keeps track of the content of these contexts. The

content of a context is what appears after the context. In a traditional left-to-right reading of

text, the contexts are unbounded to the left and the contents to the right (to the limits of text

that has already been encoded). When encoding the coder searches for the longest match to

the current context reading right to left. This is, again, not an unusual thing to do. What is

interesting is what the coder does after the best match is found. Instead of simply examining

the content corresponding to the best matched context, the coder also examines the contents

of the coders in the neighborhood of the best matched contexts. Fenwick [ 78 ] describes this

process as first finding an anchor point then searching the contents of the neighboring contexts

Next Page

Introduction to Data Compression

Search WWH ::

Custom Search

Home