Huffman Coding - Introduction to Data Compression - page 48

Databases Reference

In-Depth Information

F I GU R E 3 . 5

A binary tree of depth four.

bits per second. The same situation using the second code would lead to a deficit of 2000

bits per second. Thus, it seems reasonable to elect to use the second code instead of the first.

To obtain the Huffman code with minimum variance, we always put the combined letter as

high in the list as possible.

3.2.2 Canonical Huffman Codes

To be effective, a Huffman code needs to reflect the statistics of the source being encoded.

When encoding sources with widely varying statistics, it is sometimes necessary to include

the Huffman code along with the compressed representation of the source. Depending on the

alphabet size, this can be a substantial burden, perhaps even cancelling out any advantages of

compression. Therefore, it is useful to develop Huffman codes that can be stored in an efficient

manner. Consider the Huffman code in Table 3.5 . One way to store this code is to write the

code in lexicographic order of the symbols. Thus, we first write the code for a 1 , then the code

for a 2 , and so on. Each codeword could be represented by the length of the codeword followed

by the codeword. Thus the code would be stored as

[

,

,

,

,

,

,

,

,

,

]

.This

is certainly manageable for the Huffman code of a small alphabet, but it is clearly going to have

an impact on compression performance when we have Huffman codes for large alphabets.

We can substantially reduce the storage requirement for the code by using a version of the

Huffman code known as the canonical Huffman code . Before we describe how to construct a

canonical Huffman code, we examine a property of Huffman trees. Consider a Huffman tree

such as one of the trees in Figure 3.4 . For convenience, let us use the tree on the left. Notice

that like all Huffman trees, this is a fully populated tree. That is, there is a codeword assigned

to each leaf of the tree. What we will show is that if we are only given the lengths of the

codewords moving from left to right on the tree we can reconstruct the code. For example, in

this tree the lengths are 3, 4, 4, 2, and 1. In order to regenerate the code from the lengths, we

begin with a tree of depth four (the longest codeword), as shown in Figure 3.5 .

We then prune away the branches starting on the left, to match the lengths of the codewords.

We first prune away the leftmost two branches, as shown in Figure 3.6 , to leave a leaf at a

depth of three. This corresponds to the first codeword. The next two codewords have a length

of four, so we leave the next two leaves at a depth of four. The next codeword has a length

of two so we prune all the lower branches as indicated in Figure 3.6 , leaving a leaf at depth

two. Finally, we prune most of the right half of the tree to get a leaf corresponding to the

codeword of length one. The pruned tree can be populated by zeros on the left branches and

2

01

1

1

3

000

4

0010

5

0011

Next Page

Introduction to Data Compression

Search WWH ::

Custom Search

Home