Databases Reference
In-Depth Information
T A B L E 6 . 1
Count array for
1order
context.
Letter
Count
Cum _ Count
t
1
1
h
1
2
i
1
3
s
1
4
e
1
5
b
1
6
Total_Count
6
we would first attempt to see if the string proba has previously occurred—that is, if a had
previously occurred in the context of prob . If not, we would encode an escape and see if a had
occurred in the context of rob . If the string roba had not occurred previously, we would again
send an escape symbol and try the context ob . Continuing in this manner, we would try the
context b , and failing that, we would see if the letter a (with a zero-order context) had occurred
previously. If a was being encountered for the first time, we would use a model in which
all letters occur with equal probability to encode a . This equiprobable model is sometimes
referred to as the context of order
1.
As the development of the probabilities with respect to each context is an adaptive process,
the count corresponding to a symbol is updated each time the symbol is encountered. The
number of counts to be assigned to the escape symbol is not obvious, and a number of different
approaches have been used. One approach used by Cleary and Witten is to give the escape
symbol a count of one, thus inflating the total count by one. Cleary and Witten call this method
of assigning counts Method A and the resulting algorithm ppma . We will describe some of
the other ways of assigning counts to the escape symbol later in this section.
Before we delve into some of the details, let's work through an example to see how all this
works together. As we will be using arithmetic coding to encode the symbols, you might wish
to refresh your memory of the arithmetic coding algorithms.
Example6.3.1:
Let's encode the sequence
/
/
/
btithe
Assuming we have already encoded the initial seven characters this
this
bis
bthe
bis , the various counts
and Cum _ Count arrays to be used in the arithmetic coding of the symbols are shown in
Tables 6.1 - 6.4 . In this example, we are assuming that the longest context length is two. This
is a rather small value and is used here to keep the size of the example reasonably small. A
more common value for the longest context length is five.
We will assume that the word length for arithmetic coding is six. Thus, l = 000000 and
/
u
b .The
second-order context for this letter is is . Looking at Table 6.4 , we can see that the letter
=
111111. As this
/
bis has already been encoded, the next letter to be encoded is
/
b is
the first letter in this context with a Cum _ Count value of 1. As the Total _ Count in this case
/
 
Search WWH ::




Custom Search