Huffman Coding - Introduction to Data Compression

Databases Reference

In-Depth Information

a 2 (0.4)

a 3 (0.6)

"

a 1 (0.2)

a 3 (0.4)

'

a 2 (0.4)

0

a 3 (0.2)

0

a 1 (0.2)

1

a 4 (0.1)

0

a 4 (0.2)

'

1

a 5 (0.1)

1

F I GU R E 3 . 1

The Huffman encoding procedure. The symbol probabilities are listed

in parentheses.

A measure of the efficiency of this code is its redundancy —the difference between the entropy

and the average length. In this case, the redundancy is 0.078 bits/symbol. The redundancy is

zero when the probabilities are negative powers of two.

An alternative way of building a Huffman code is to use the fact that the Huffman code, by

virtue of being a prefix code, can be represented as a binary tree in which the external nodes

or leaves correspond to the symbols. The Huffman code for any symbol can be obtained by

traversing the tree from the root node to the leaf corresponding to the symbol, adding a 0 to

the codeword every time the traversal takes us over an upper branch and a 1 every time the

traversal takes us over a lower branch.

We build the binary tree starting at the leaf nodes. We know that the codewords for the

two symbols with smallest probabilities are identical except for the last bit. This means that

the traversal from the root to the leaves corresponding to these two symbols must be the same

except for the last step. This in turn means that the leaves corresponding to the two symbols

with the lowest probabilities are offspring of the same node. Once we have connected the

leaves corresponding to the symbols with the lowest probabilities to a single node, we treat

this node as a symbol of a reduced alphabet. The probability of this symbol is the sum of

the probabilities of its offspring. We can now sort the nodes corresponding to the reduced

alphabet and apply the same rule to generate a parent node for the nodes corresponding to the

two symbols in the reduced alphabet with lowest probabilities. Continuing in this manner, we

end up with a single node, which is the root node. To obtain the code for each symbol, we

traverse the tree from the root to each leaf node, assigning a 0 to the upper branch and a 1 to

the lower branch. This procedure, as applied to the alphabet of Example 3.2.1 ,isshownin

Figure 3.2 . Notice the similarity between Figures 3.1 and 3.2 . This is not surprising, as they

are a result of viewing the same procedure in two different ways.

3.2.1 Minimum Variance Huffman Codes

By performing the sorting procedure in a slightly different manner, we could have found a

different Huffman code. In the first re-sort, we could place a 4 higher in the list, as shown in

Table 3.6 .

Introduction to Data Compression

Search WWH ::

Custom Search

Home