Databases Reference
In-Depth Information
It does not take many symbols in a sequence before the coding rate for the arithmetic code
becomes quite close to the entropy. However, recall that for Huffman codes, if we block m
symbols together, the coding rate is
1
m
(
)
l H
(
) +
H
X
H
X
The advantage seems to lie with the Huffman code, although the advantage decreases with
increasing m . However, remember that to generate a codeword for a sequence of length m
using the Huffman procedure requires building the entire code for all possible sequences of
length m . If the original alphabet size was k , then the size of the codebook would be k m .
Taking relatively reasonable values of k
20 gives a codebook size of 16 20 !This
is obviously not a viable option. For the arithmetic coding procedure, we do not need to build
the entire codebook. Instead, we simply obtain the code for the tag corresponding to a given
sequence. Therefore, it is entirely feasible to code sequences of length 20 or much more. In
practice, we can make m large for the arithmetic coder and not for the Huffman coder. This
means that for most sources we can get rates closer to the entropy by using arithmetic coding
than by using Huffman coding. The exceptions are sources whose probabilities are powers of
two. In these cases, the single-letter Huffman code achieves the entropy, and we cannot do
any better with arithmetic coding, no matter how long a sequence we pick.
The amount of gain also depends on the source. Recall that for Huffman codes, we are
guaranteed to obtain rates within 0
=
16 and m
=
p max of the entropy, where p max is the probability
of the most probable letter in the alphabet. If the alphabet size is relatively large and the prob-
abilities are not too skewed, the maximum probability p max is generally small. In these cases,
the advantage of arithmetic coding over Huffman coding is small, and it might not be worth
the extra complexity to use arithmetic coding rather than Huffman coding. However, there are
many sources, such as facsimile, in which the alphabet size is small, and the probabilities are
highly unbalanced. In these cases, the use of arithmetic coding is generally worth the added
complexity.
Another major advantage of arithmetic coding is that it is easy to implement a system with
multiple arithmetic codes. This may seem contradictory, as we have claimed that arithmetic
coding is more complex than Huffman coding. However, it is the computational machinery that
causes the increase in complexity. Once we have the computational machinery to implement
one arithmetic code, all we need to implement more than a single arithmetic code is the
availability of more probability tables. If the alphabet size of the source is small, as in the
case of a binary source, there is very little added complexity indeed. In fact, as we shall see
in the next section, it is possible to develop multiplication-free arithmetic coders that are quite
simple to implement (nonbinary multiplication-free arithmetic coders are described in [ 45 ]).
Finally, it is much easier to adapt arithmetic codes to changing input statistics. All we need
to do is estimate the probabilities of the input alphabet. This can be done by keeping a count
of the letters as they are coded. There is no need to preserve a tree, as with adaptive Huffman
codes. Furthermore, there is no need to generate a code a priori, as in the case of Huffman
coding. This property allows us to separate the modeling and coding procedures in a manner
that is not very feasible with Huffman coding. This separation permits greater flexibility in
the design of compression systems, which can be used to great advantage.
.
086
+
Search WWH ::




Custom Search