Huffman Coding - Introduction to Data Compression

Databases Reference

In-Depth Information

The Huffman code can be described in terms of a binary tree similar to the ones shown in

Figure 3.4 . The squares denote the external nodes or leaves and correspond to the symbols in

the source alphabet. The codeword for a symbol can be obtained by traversing the tree from

the root to the leaf corresponding to the symbol, where 0 corresponds to a left branch and 1

corresponds to a right branch. In order to describe how the adaptive Huffman code works,

we add two other parameters to the binary tree: the weight of each leaf, which is written as a

number inside the node, and a node number . The weight of each external node is simply the

number of times the symbol corresponding to the leaf has been encountered. The weight of

each internal node is the sum of the weights of its offspring. The node number y i is a unique

number assigned to each internal and external node. If we have an alphabet of size n , then

the 2 n

−

1 internal and external nodes can be numbered as y 1 ,...,

y 2 n − 1 such that if x j is the

weight of node y j ,wehave x 1

x 2 n − 1 . Furthermore, the nodes y 2 j − 1 and y 2 j

are offspring of the same parent node, or siblings, for 1

x 2 ···

n , and the node number for the

parent node is greater than y 2 j − 1 and y 2 j . These last two characteristics are called the sibling

property , and any tree that possesses this property is a Huffman tree [ 21 ].

In the adaptive Huffman coding procedure, neither transmitter nor receiver knows anything

about the statistics of the source sequence at the start of transmission. The tree at both the

transmitter and the receiver consists of a single node that corresponds to all symbols not yet

transmitted (NYT) and has a weight of zero. As transmission progresses, nodes corresponding

to symbols transmitted are added to the tree, and the tree is reconfigured using an update

procedure. Before the beginning of transmission, a fixed code for each symbol is agreed upon

between transmitter and receiver. A simple (short) code is as follows:

If the source has an alphabet

(

a 1 ,

a 2 ,...,

a m )

of size m , then pick e and r such that

2 e

2 e . The letter a k is encoded as the

r and 0

(

)

-bit binary representation

of k

−

1, if 1

2 r ;else, a k is encoded as the e -bit binary representation of k

−

For example, suppose m

10. The symbol a 1 is encoded as 00000,

the symbol a 2 is encoded as 00001, and the symbol a 22 is encoded as 1011.

When a symbol is encountered for the first time, the code for the NYT node is transmitted,

followed by the fixed code for the symbol. A node for the symbol is then created, and the

symbol is taken out of the NYT list.

Both transmitter and receiver start with the same tree structure. The updating procedure

used by both transmitter and receiver is identical. Therefore, the encoding and decoding

processes remain synchronized.

26, then e

4, and r

3.4.1 Update Procedure

The update procedure requires that the nodes be in a fixed order. This ordering is preserved by

numbering the nodes. The largest node number is given to the root of the tree, and the smallest

number is assigned to the NYT node. The numbers from the NYT node to the root of the tree

are assigned in increasing order from left to right and from lower level to upper level. The set

of nodes with the same weight makes up a block . Figure 3.12 is a flowchart of the updating

procedure.

The function of the update procedure is to preserve the sibling property. In order that the

update procedures at the transmitter and receiver both operate with the same information, the

Introduction to Data Compression

Search WWH ::

Custom Search

Home