Context-Based Compression - Introduction to Data Compression

Databases Reference

In-Depth Information

T A B L E 6 . 1

Count array for

1order

−

context.

Letter

Count

Cum _ Count

t

1

h

1

2

i

1

3

s

1

4

e

1

5

b

1

6

Total_Count

6

we would first attempt to see if the string proba has previously occurred—that is, if a had

previously occurred in the context of prob . If not, we would encode an escape and see if a had

occurred in the context of rob . If the string roba had not occurred previously, we would again

send an escape symbol and try the context ob . Continuing in this manner, we would try the

context b , and failing that, we would see if the letter a (with a zero-order context) had occurred

previously. If a was being encountered for the first time, we would use a model in which

all letters occur with equal probability to encode a . This equiprobable model is sometimes

referred to as the context of order

1.

As the development of the probabilities with respect to each context is an adaptive process,

the count corresponding to a symbol is updated each time the symbol is encountered. The

number of counts to be assigned to the escape symbol is not obvious, and a number of different

approaches have been used. One approach used by Cleary and Witten is to give the escape

symbol a count of one, thus inflating the total count by one. Cleary and Witten call this method

of assigning counts Method A and the resulting algorithm ppma . We will describe some of

the other ways of assigning counts to the escape symbol later in this section.

Before we delve into some of the details, let's work through an example to see how all this

works together. As we will be using arithmetic coding to encode the symbols, you might wish

to refresh your memory of the arithmetic coding algorithms.

−

Example6.3.1:

Let's encode the sequence

/

btithe

Assuming we have already encoded the initial seven characters this

this

bis

bthe

bis , the various counts

and Cum _ Count arrays to be used in the arithmetic coding of the symbols are shown in

Tables 6.1 - 6.4 . In this example, we are assuming that the longest context length is two. This

is a rather small value and is used here to keep the size of the example reasonably small. A

more common value for the longest context length is five.

We will assume that the word length for arithmetic coding is six. Thus, l = 000000 and

/

u

b .The

second-order context for this letter is is . Looking at Table 6.4 , we can see that the letter

=

111111. As this

/

bis has already been encoded, the next letter to be encoded is

/

b is

the first letter in this context with a Cum _ Count value of 1. As the Total _ Count in this case

/

Introduction to Data Compression

Search WWH ::

Custom Search

Home