Databases Reference
In-Depth Information
T A B L E 4 . 2
Huffman code for extended
alphabet.
Letter
Probability
Code
a 1 a 1
0.9025
0
a 1 a 2
0.0190
111
a 1 a 3
0.0285
100
a 2 a 1
0.0190
1101
a 2 a 2
0.0004
110011
a 2 a 3
0.0006
110001
a 3 a 1
0.0285
101
a 3 a 2
0.0006
110010
a 3 a 3
0.0009
110000
sequences without having to generate codes for all sequences of that length. The arithmetic
coding technique fulfills this requirement.
In arithmetic coding, a unique identifier or tag is generated for the sequence to be encoded.
This tag corresponds to a binary fraction, which becomes the binary code for the sequence.
In practice, the generation of the tag and the binary code are the same process. However, the
arithmetic coding approach is easier to understand if we conceptually divide the approach into
two phases. In the first phase, a unique identifier or tag is generated for a given sequence
of symbols. This tag is then given a unique binary code. A unique arithmetic code can
be generated for a sequence of length m without the need for generating codewords for all
sequences of length m . This is unlike the situation for Huffman codes. In order to generate
a Huffman code for a sequence of length m , where the code is not a concatenation of the
codewords for the individual symbols, we need to obtain the Huffman codes for all sequences
of length m .
4.3 Coding a Sequence
In order to distinguish a sequence of symbols from another sequence of symbols we need to
tag it with a unique identifier. One possible set of tags for representing sequences of symbols
are the numbers in the unit interval [0,1). Because the number of numbers in the unit interval
is infinite, it should be possible to assign a unique tag to each distinct sequence of symbols. In
order to do this, we need a function that will map sequences of symbols into the unit interval.
A function that maps random variables, and sequences of random variables, into the unit
interval is the cumulative distribution function ( cdf ) of the random variable associated with
the source. This is the function we will use in developing the arithmetic code. (If you are not
familiar with random variables and cumulative distribution functions or need to refresh your
memory, you may wish to look at Appendix A.)
The use of the cumulative distribution function to generate a binary code for a sequence has a
rather interesting history. Shannon, in his original 1948 paper [ 3 ], mentioned an approach using
 
Search WWH ::




Custom Search