Databases Reference
In-Depth Information
3.3 Nonbinary Huffman Codes
The binary Huffman coding procedure can be easily extended to the nonbinary case where
the code elements come from an m -ary alphabet, and m is not equal to two. Recall that we
obtained the Huffman algorithm based on the following observations about an optimum binary
prefix code:
1. Symbols that occur more frequently (have a higher probability of occurrence) will have
shorter codewords than symbols that occur less frequently.
2. The two symbols that occur least frequently will have the same length.
Also recall that an additional requirement is that the two symbols with the lowest probability
differ only in the last position.
We can obtain a nonbinary Huffman code in almost exactly the same way. The obvious
thing to do would be to modify the second observation to read “The m symbols that occur least
frequently will have the same length,” and also modify the additional requirement to read “The
m symbols with the lowest probability differ only in the last position.”
However, we run into a small problem with this approach. Consider the design of a ternary
Huffman code for a source with a six-letter alphabet. Using the rules described above, we
would first combine the three letters with the lowest probability into a composite letter. This
would give us a reduced alphabet with four letters. However, combining the three letters with
lowest probability from this alphabet would result in a further reduced alphabet consisting of
only two letters. We have three values to assign and only two letters. Instead of combining
three letters at the beginning, we could have combined two letters. This would result in a
reduced alphabet of size five. If we combined three letters from this alphabet, we would
end up with a final reduced alphabet size of three. Finally, we could combine two letters in
the second step, which would again result in a final reduced alphabet of size three. Which
alternative should we choose?
Recall that the symbols with the lowest probability will have the longest codeword. Fur-
thermore, all of the symbols that we combine together into a composite symbol will have
codewords of the same length. This means that all letters we combine together at the very first
stage will have codewords that have the same length, and these codewords will be the longest
of all the codewords. If at some stage we are allowed to combine less than m symbols, the
logical place to do this would be in the very first stage.
In the general case of an m -ary code and an M -letter alphabet, how many letters should
we combine in the first phase? Let m be the number of letters that are combined in the first
phase. Then m is the number between two and m , such that m modulo
(
m
1
)
is equal to M
modulo
(
m
1
)
.
Example3.3.1:
Generate a ternary Huffman code for a source with a six-letter alphabet and a probability model
P
(
a 1 ) =
(
a 3 ) =
(
a 4 ) =
.
,
(
a 5 ) =
.
,
(
a 6 ) =
.
(
a 2 ) =
.
P
P
0
2
P
0
25
P
0
1, and P
0
05. In this case
3, therefore m is either 2 or 3.
=
m
6
(
mod 2
) =
0
,
2
(
mod 2
) =
0
,
3
(
mod 2
) =
1
Search WWH ::




Custom Search