Binary Trees - Data Structures and Algorithm Analysis

Java Reference

In-Depth Information

a reverse process from that used to generate the codes. Decoding a bit string begins

at the root of the tree. We take branches depending on the bit value — left for '0'

and right for '1' — until reaching a leaf node. This leaf contains the first character

in the message. We then process the next bit in the code restarting at the root to

begin the next character.

Example5.10 To decode the bit string “1011001110111101” we begin

at the root of the tree and take a right branch for the first bit which is '1.'

Because the next bit is a '0' we take a left branch. We then take another

right branch (for the third bit '1'), arriving at the leaf node corresponding

to the letter D. Thus, the first letter of the coded word is D. We then begin

again at the root of the tree to process the fourth bit, which is a '1.' Taking

a right branch, then two left branches (for the next two bits which are '0'),

we reach the leaf node corresponding to the letter U. Thus, the second letter

is U. In similar manner we complete the decoding process to find that the

last two letters are C and K, spelling the word “DUCK.”

A set of codes is said to meet the prefix property if no code in the set is the

prefix of another. The prefix property guarantees that there will be no ambiguity in

how a bit string is decoded. In other words, once we reach the last bit of a code

during the decoding process, we know which letter it is the code for. Huffman codes

certainly have the prefix property because any prefix for a code would correspond to

an internal node, while all codes correspond to leaf nodes. For example, the code

for M is '11111.' Taking five right branches in the Huffman tree of Figure 5.26

brings us to the leaf node containing M. We can be sure that no letter can have code

'111' because this corresponds to an internal node of the tree, and the tree-building

process places letters only at the leaf nodes.

How efficient is Huffman coding? In theory, it is an optimal coding method

whenever the true frequencies are known, and the frequency of a letter is indepen-

dent of the context of that letter in the message. In practice, the frequencies of

letters in an English text document do change depending on context. For example,

while E is the most commonly used letter of the alphabet in English documents,

T is more common as the first letter of a word. This is why most commercial com-

pression utilities do not use Huffman coding as their primary coding method, but

instead use techniques that take advantage of the context for the letters.

Another factor that affects the compression efficiency of Huffman coding is the

relative frequencies of the letters. Some frequency patterns will save no space as

compared to fixed-length codes; others can result in great compression. In general,

Huffman coding does better when there is large variation in the frequencies of

letters.

In the particular case of the frequencies shown in Figure 5.31, we can

Search WWH ::

Custom Search

Home