Huffman Coding - Introduction to Data Compression

Databases Reference

In-Depth Information

The most widely used algorithm for constructing length-limited Huffman codes is the

package-merge algorithm due to Larmore and Hirchberg [ 18 ]. The package-merge algorithm

is more general than what we describe here, our only interest being its application to designing

length-limited Huffman codes. Our description is based on the work of Turpin and Moffat

[ 19 ]. We will use a couple of facts from our prior discussions to design length-limited Huffman

codes. First, as we have seen in the previous section, all we need to obtain a Huffman code

is the length of the codewords. Second, as we showed in Chapter 2, for an alphabet of size

m the lengths of the codewords

{

l 1 ,

l 2 , ··· ,

l m }

in any variable length code have to satisfy the

Kraft-McMillan inequality

2 − l i

and, given a set of l i that satisfy the Kraft-McMillan inequality, we can always find a uniquely

decodable code with codewords of length l i .

Suppose our alphabet consists of

{

a 1 ,

a 2 ,...,

a m }

the letters

with probabilities

{

p 1 ,

p 2 ,...,

p m }

. We will assume that the letters have been sorted based on their proba-

bilities such that p 1

p m . Our design process will involve incrementing the

lengths l i until the Kraft-McMillan inequality is satisfied while at the same time keeping the

average length l as small as we can. We start with assigning zero bits for all lengths. Clearly,

this will result in the lowest possible length of zero. However, it certainly does not give us a

uniquely decodable code. Not that we need to justify the latter point mathematically, but if we

did, we could compute the Kraft-McMillan sum as

p 2 ···

2 − l i

2 0

which is clearly greater than one. We can decrement this sum by 0.5 by incrementing the

codeword for one of the letters to one. We want to pick the codeword that will have the

minimal impact on the average codeword length. As the letters are sorted in nondecreasing

probability order and the average codeword length is given by

p i l i

we will increase l 1 to one. This will give us an average codeword length of p 1 and a Kraft-

McMillan sum of m

5. Actually, we could have picked any of the letters as we will need to

increment each length to at least one if we are to have any hope of reducing the Kraft-McMillan

sum to one. However, once we have incremented each codeword length to one how do we

decide which codeword lengths to increment further? The package-merge algorithm is an

iterative algorithm that solves this problem by generating a list of choices that can be sorted in

order of increasing cost. Each iteration consists of two steps operating on the list, a packaging

step and a merge step. In the packaging step the list from the previous step is partitioned,

or packaged, into groups of two neighboring items. The cost of a package is the sum of the

costs of the items in the package where the cost of a letter is the probability of that letter. If

the number of items is odd the item with the highest cost is discarded. In the merge step the

−

Introduction to Data Compression

Search WWH ::

Custom Search

Home