Huffman Coding - Introduction to Data Compression

Databases Reference

In-Depth Information

It is easy to see that the first observation is correct. If symbols that occur more often had

codewords that were longer than the codewords for symbols that occurred less often, the

average number of bits per symbol would be larger than if the conditions were reversed.

Therefore, a code that assigns longer codewords to symbols that occur more frequently cannot

be optimum.

To see why the second observation holds true, consider the following situation. Suppose

an optimum code

exists in which the two codewords corresponding to the two least probable

symbols do not have the same length. Suppose the longer codeword is k bits longer than the

shorter codeword. Because this is a prefix code, the shorter codeword cannot be a prefix of the

longer codeword. This means that even if we drop the last k bits of the longer codeword, the

two codewords would still be distinct. As these codewords correspond to the least probable

symbols in the alphabet, no other codeword can be longer than these codewords; therefore, there

is no danger that the shortened codeword would become the prefix of some other codeword.

Furthermore, by dropping these k bits we obtain a new code that has a shorter average length

than

. But this violates our initial contention that

is an optimal code. Therefore, for an

optimal code the second observation also holds true.

The Huffman procedure adds a simple requirement to these two observations. This re-

quirement is that the codewords corresponding to the two lowest probability symbols differ

only in the last bit. Assume that

and

are the two least probable symbols in an alphabet. If

the codeword for

is m

∗

0, the codeword for

would be m

∗

1. Here, m is a string of 1s and

0s, and

denotes concatenation.

This requirement does not violate our two observations and leads to a very simple encoding

procedure. We describe this procedure with the help of the following example.

∗

Examp l e 3 . 2 . 1 : Design of a Huffman Code

Let us design a Huffman code for a source that puts out letters from an alphabet

A =

{

The entropy for this source is 2.122 bits/symbol. To design the Huffman code, we first sort

the letters in a descending probability order as shown in Table 3.1 .Here c

a 1 ,

a 2 ,

a 3 ,

a 4 ,

a 5 }

with P

(

a 1 ) =

(

a 3 ) =

2, P

(

a 2 ) =

4, and P

(

a 4 ) =

(

a 5 ) =

(

a i )

denotes the

codeword for a i .

The two symbols with the lowest probability are a 4 and a 5 . Therefore, we can assign their

codewords as

T A B L E 3 . 1

The initial five-letter alphabet.

Letter

Probability

Codeword

a 2

0 . 4

c ( a 2 )

a 1

0 . 2

c ( a 1 )

a 3

0 . 2

c ( a 3 )

a 4

0 . 1

c ( a 4 )

a 5

0 . 1

c ( a 5 )

Introduction to Data Compression

Search WWH ::

Custom Search

Home