Databases Reference
In-Depth Information
0 1
0 1
1
0
1
0
F I GU R E 3 . 8
Tree for the canonical Huffman code.
We illustrate this procedure using the example we have been considering in this section.
Our alphabet consists of five letters
{
a 1 ,
a 2 ,
a 3 ,
a 4 ,
a 5 }
. The lengths of the corresponding
Huffman codewords are
. We generate the codewords from the shortest to the
longest keeping in mind the constraint that the shorter codewords lexicographically precede
longer codewords. The shortest codeword is that assigned to a 2 . The lexicographically smallest
codeword of length one is 0, so the codeword for a 2 is 0. The codewords of length two have to
be of the form 1x. There is only one codeword of length two required, that being the codeword
for a 1 , so we assign the codeword 10 to a 1 . The codewords of length three now have to be of
the form 11x. We only need a single codeword of length three, the codeword for a 3 ; therefore,
the codeword for a 3 is 110. There are two codewords, a 4 and a 5 , of length four. Therefore, the
codeword for a 4 is 1110, and the codeword for a 5 is 1111. This code can now be encoded by
just sending the lengths. Unlike the previous code we know exactly which codeword belongs
to which letter in the alphabet. This is because of the lexicographic constraints. As can be
seen from the tree for this code in Figure 3.8 , the tree increases in depth as we move from left
to right. We know from the sequence of lengths that the shortest codeword is for the letter
a 2 , the next longest codeword is for the letter a 1 , and so on. Thus simply encoding the list of
lengths is sufficient for storing the Huffman code. For an alphabet of size five, this is not a
significant saving; however, when the alphabet size is on the order of 256, the savings can be
very significant.
{
2
,
1
,
3
,
4
,
4
}
3.2.3 Length-Limited Huffman Codes
The Huffman coding algorithm tries to minimize the average length of codewords. However,
there are no limits on the maximum length of an individual codeword. This is not necessarily a
problem when dealing with limited alphabet sizes. However, if the alphabet size is relatively
large there is a possibility of getting very long Huffman codes. This could result in codewords
that do not fit in a single machine word. This in turn can lead to inefficiencies in the coding
process. These problems can be avoided by adding an additional constraint to the variable
length code design problem: a requirement that all codewords be less than or equal to some
maximum codeword length l max .If m is the size of the source alphabet then clearly l max has
to be greater than or equal to
log 2 m
for the code to be a valid code.
Search WWH ::




Custom Search