Huffman Coding - Introduction to Data Compression

Databases Reference

In-Depth Information

Example3.2.5:

For the source described in the previous example, we will generate a codeword for every two

symbols, instead of generating a codeword for every symbol. If we look at the source sequence

two at a time, the number of possible symbol pairs, or size of the extended alphabet, is 3 2

9.

The extended alphabet, probability model, and Huffman code for this example are shown in

Table 3.13 .

=

T A B L E 3 . 13

The extended alphabet and

corresponding Huffman code

Letter

Probability

Code

a 1 a 1

0 . 64

0

a 1 a 2

0 . 016

10101

a 1 a 3

0 . 144

11

a 2 a 1

0 . 016

101000

a 2 a 2

0 . 0004

10100101

a 2 a 3

0 . 0036

1010011

a 3 a 1

0 . 1440

100

a 3 a 2

0 . 0036

10100100

.

a 3 a 3

0

0324

1011

The average codeword length for this extended code is 1.7228 bits/symbol. However,

each symbol in the extended alphabet corresponds to two symbols from the original alphabet.

Therefore, in terms of the original alphabet, the average codeword length is 1

8614

bits/symbol. This redundancy is about 0.045 bits/symbol, which is only about 5.5% of the

entropy.

.

7228

/

2

=

0

.

We see that by coding blocks of symbols together, we can reduce the redundancy of

Huffman codes. In the previous example, two symbols were blocked together to obtain a rate

reasonably close to the entropy. Blocking two symbols together means the alphabet size goes

from m to m 2 , where m was the size of the initial alphabet. In this case, m was three, so the size

of the extended alphabet was nine. This size is not an excessive burden for most applications.

However, if the probabilities of the symbols were more unbalanced, then it would require

blocking many more symbols together before the redundancy was lowered to acceptable levels.

As we block more and more symbols together, the size of the alphabet grows exponentially, and

the Huffman coding scheme becomes impractical. Under these conditions, we need to look at

techniques other than Huffman coding. One approach that is very useful in these conditions

is arithmetic coding . We will discuss this technique in some detail in the next chapter.

3.2.7 Implementation of Huffman Codes

Huffman encoding is relatively simple to implement using a table lookup. Decoding is another

matter. If speed is the only factor, we can implement the decoder using table lookup as well.

However, for the decoder, the table has to be of size 2 N where N is the length of the longest

Introduction to Data Compression

Search WWH ::

Custom Search

Home