Arithmetic Coding - Introduction to Data Compression

Databases Reference

In-Depth Information

It does not take many symbols in a sequence before the coding rate for the arithmetic code

becomes quite close to the entropy. However, recall that for Huffman codes, if we block m

symbols together, the coding rate is

1

m

(

)

l H

(

) +

H

X

H

X

The advantage seems to lie with the Huffman code, although the advantage decreases with

increasing m . However, remember that to generate a codeword for a sequence of length m

using the Huffman procedure requires building the entire code for all possible sequences of

length m . If the original alphabet size was k , then the size of the codebook would be k m .

Taking relatively reasonable values of k

20 gives a codebook size of 16 20 !This

is obviously not a viable option. For the arithmetic coding procedure, we do not need to build

the entire codebook. Instead, we simply obtain the code for the tag corresponding to a given

sequence. Therefore, it is entirely feasible to code sequences of length 20 or much more. In

practice, we can make m large for the arithmetic coder and not for the Huffman coder. This

means that for most sources we can get rates closer to the entropy by using arithmetic coding

than by using Huffman coding. The exceptions are sources whose probabilities are powers of

two. In these cases, the single-letter Huffman code achieves the entropy, and we cannot do

any better with arithmetic coding, no matter how long a sequence we pick.

The amount of gain also depends on the source. Recall that for Huffman codes, we are

guaranteed to obtain rates within 0

=

16 and m

=

p max of the entropy, where p max is the probability

of the most probable letter in the alphabet. If the alphabet size is relatively large and the prob-

abilities are not too skewed, the maximum probability p max is generally small. In these cases,

the advantage of arithmetic coding over Huffman coding is small, and it might not be worth

the extra complexity to use arithmetic coding rather than Huffman coding. However, there are

many sources, such as facsimile, in which the alphabet size is small, and the probabilities are

highly unbalanced. In these cases, the use of arithmetic coding is generally worth the added

complexity.

Another major advantage of arithmetic coding is that it is easy to implement a system with

multiple arithmetic codes. This may seem contradictory, as we have claimed that arithmetic

coding is more complex than Huffman coding. However, it is the computational machinery that

causes the increase in complexity. Once we have the computational machinery to implement

one arithmetic code, all we need to implement more than a single arithmetic code is the

availability of more probability tables. If the alphabet size of the source is small, as in the

case of a binary source, there is very little added complexity indeed. In fact, as we shall see

in the next section, it is possible to develop multiplication-free arithmetic coders that are quite

simple to implement (nonbinary multiplication-free arithmetic coders are described in [ 45 ]).

Finally, it is much easier to adapt arithmetic codes to changing input statistics. All we need

to do is estimate the probabilities of the input alphabet. This can be done by keeping a count

of the letters as they are coded. There is no need to preserve a tree, as with adaptive Huffman

codes. Furthermore, there is no need to generate a code a priori, as in the case of Huffman

coding. This property allows us to separate the modeling and coding procedures in a manner

that is not very feasible with Huffman coding. This separation permits greater flexibility in

the design of compression systems, which can be used to great advantage.

.

086

+

Introduction to Data Compression

Search WWH ::

Custom Search

Home