Huffman Coding - Introduction to Data Compression

Databases Reference

In-Depth Information

As the code is an optimal code i = 1 2 − l i

1, therefore

0 (3)

We will prove the upper bound by showing that there exists a uniquely decodable code with

average codeword length less than H

(S) −

1. Therefore, if we have an optimal code, this code

must have an average length that is less than H

(S) +

Given a source, alphabet, and probability model as before, define

(S) +

log 2

l i

(

a i )

where

is the smallest integer greater than or equal to x . For example,

4 and

5. Thus,

where 0

Therefore,

log 2

a i )

l i

log 2

a i ) +

(4)

(

From the left inequality of ( 4 ), we can see that

2 − l i

(

a i )

Accordingly,

2 − l i

(

a i ) =

and by the Kraft-McMillan inequality, there exists a uniquely decodable code with codeword

lengths

{

l i }

. The average length of this code can be upper-bounded by using the right inequality

of ( 4 ):

log 2

(

a i )

l i

(

a i )

a i ) +

(

We can see from the way the upper boundwas derived that this is a rather loose upper bound.

In fact, it can be shown that if p max is the largest probability in the probability model, then for

p max

(S) +

5, the upper bound for the Huffman code is H

(S) +

p max , while for p max <

5, the

upper bound is H

086. Obviously, this is a much tighter bound than the one

we derived above. The derivation of this bound takes some time (see [ 21 ] for details).

(S) +

p max +

3.2.6 Extended Huffman Codes

In applications where the alphabet size is large, p max is generally quite small, and the amount

of deviation from the entropy, especially in terms of a percentage of the rate, is quite small.

However, in cases where the alphabet is small and the probability of occurrence of the different

letters is skewed, the value of p max can be quite large; and the Huffman code can become rather

inefficient when compared to the entropy.

Introduction to Data Compression

Search WWH ::

Custom Search

Home