Databases Reference
In-Depth Information
F I GU R E 3 . 5
A binary tree of depth four.
bits per second. The same situation using the second code would lead to a deficit of 2000
bits per second. Thus, it seems reasonable to elect to use the second code instead of the first.
To obtain the Huffman code with minimum variance, we always put the combined letter as
high in the list as possible.
3.2.2 Canonical Huffman Codes
To be effective, a Huffman code needs to reflect the statistics of the source being encoded.
When encoding sources with widely varying statistics, it is sometimes necessary to include
the Huffman code along with the compressed representation of the source. Depending on the
alphabet size, this can be a substantial burden, perhaps even cancelling out any advantages of
compression. Therefore, it is useful to develop Huffman codes that can be stored in an efficient
manner. Consider the Huffman code in Table 3.5 . One way to store this code is to write the
code in lexicographic order of the symbols. Thus, we first write the code for a 1 , then the code
for a 2 , and so on. Each codeword could be represented by the length of the codeword followed
by the codeword. Thus the code would be stored as
[
,
,
,
,
,
,
,
,
,
]
.This
is certainly manageable for the Huffman code of a small alphabet, but it is clearly going to have
an impact on compression performance when we have Huffman codes for large alphabets.
We can substantially reduce the storage requirement for the code by using a version of the
Huffman code known as the canonical Huffman code . Before we describe how to construct a
canonical Huffman code, we examine a property of Huffman trees. Consider a Huffman tree
such as one of the trees in Figure 3.4 . For convenience, let us use the tree on the left. Notice
that like all Huffman trees, this is a fully populated tree. That is, there is a codeword assigned
to each leaf of the tree. What we will show is that if we are only given the lengths of the
codewords moving from left to right on the tree we can reconstruct the code. For example, in
this tree the lengths are 3, 4, 4, 2, and 1. In order to regenerate the code from the lengths, we
begin with a tree of depth four (the longest codeword), as shown in Figure 3.5 .
We then prune away the branches starting on the left, to match the lengths of the codewords.
We first prune away the leftmost two branches, as shown in Figure 3.6 , to leave a leaf at a
depth of three. This corresponds to the first codeword. The next two codewords have a length
of four, so we leave the next two leaves at a depth of four. The next codeword has a length
of two so we prune all the lower branches as indicated in Figure 3.6 , leaving a leaf at depth
two. Finally, we prune most of the right half of the tree to get a leaf corresponding to the
codeword of length one. The pruned tree can be populated by zeros on the left branches and
2
01
1
1
3
000
4
0010
5
0011
Search WWH ::




Custom Search