Huffman Coding - Introduction to Data Compression

Databases Reference

In-Depth Information

T A B L E 3 . 12

Huffman code for

the alphabet

Letter

Codeword

a 1

a 2

a 3

Example3.2.4:

Consider a source that puts out iid letters from the alphabet

A ={

a 1 ,

a 2 ,

a 3 }

with the probability

model P

18. The entropy for this source is 0.816

bits/symbol. A Huffman code for this source is shown in Table 3.12 .

The average length for this code is 1

(

a 1 ) =

8, P

(

a 2 ) =

02, and P

(

a 3 ) =

2 bits/symbol. The difference between the average

code length and the entropy, or the redundancy, for this code is 0.384 bits/symbol, which is

47% of the entropy. This means that to code this sequence, we would need 47%more bits than

the minimum required.

We can sometimes reduce the coding rate by blocking more than one symbol together. To

see how this can happen, consider a source S that emits a sequence of letters from an alphabet

A ={

. Each element of the sequence is generated independently of the other

elements in the sequence. The entropy for this source is given by

a 1 ,

a 2 ,...,

a m }

(

) =−

(

a i )

log 2 P

(

a i )

i = 1

We know that we can generate a Huffman code for this source with rate R such that

1 (5)

We have used the looser bound here; the same argument can be made with the tighter bound.

Notice that we have used “rate R ” to denote the number of bits per symbol. This is a standard

convention in the data compression literature. However, in the communication literature, the

word “rate” often refers to the number of bits per second.

Suppose we now encode the sequence by generating one codeword for every n symbols.

As there are m n combinations of n symbols, we will need m n codewords in our Huffman code.

We could generate this code by viewing the m n symbols as letters of an extended alphabet

(

)

(

) +

n tim es

A ( n ) ={

a m }

from a source S ( n ) . Let us denote the rate for the new source as R ( n ) . Then we know that

a 1 a 1 ...

a 1 ,

a 1 a 1 ...

a 2 ,...,

a 1 a 1 ...

a m ,

a 1 a 1 ...

a 2 a 1 ,...,

a m a m ...

S ( n ) )

R ( n ) <

S ( n ) ) +

1 (6)

R ( n ) is the number of bits required to code n symbols. Therefore, the number of bits required

per symbol, R , is given by

(

n R ( n )

Introduction to Data Compression

Search WWH ::

Custom Search

Home