Java Reference
In-Depth Information
a reverse process from that used to generate the codes. Decoding a bit string begins
at the root of the tree. We take branches depending on the bit value — left for '0'
and right for '1' — until reaching a leaf node. This leaf contains the first character
in the message. We then process the next bit in the code restarting at the root to
begin the next character.
Example5.10 To decode the bit string “1011001110111101” we begin
at the root of the tree and take a right branch for the first bit which is '1.'
Because the next bit is a '0' we take a left branch. We then take another
right branch (for the third bit '1'), arriving at the leaf node corresponding
to the letter D. Thus, the first letter of the coded word is D. We then begin
again at the root of the tree to process the fourth bit, which is a '1.' Taking
a right branch, then two left branches (for the next two bits which are '0'),
we reach the leaf node corresponding to the letter U. Thus, the second letter
is U. In similar manner we complete the decoding process to find that the
last two letters are C and K, spelling the word “DUCK.”
A set of codes is said to meet the prefix property if no code in the set is the
prefix of another. The prefix property guarantees that there will be no ambiguity in
how a bit string is decoded. In other words, once we reach the last bit of a code
during the decoding process, we know which letter it is the code for. Huffman codes
certainly have the prefix property because any prefix for a code would correspond to
an internal node, while all codes correspond to leaf nodes. For example, the code
for M is '11111.' Taking five right branches in the Huffman tree of Figure 5.26
brings us to the leaf node containing M. We can be sure that no letter can have code
'111' because this corresponds to an internal node of the tree, and the tree-building
process places letters only at the leaf nodes.
How efficient is Huffman coding? In theory, it is an optimal coding method
whenever the true frequencies are known, and the frequency of a letter is indepen-
dent of the context of that letter in the message. In practice, the frequencies of
letters in an English text document do change depending on context. For example,
while E is the most commonly used letter of the alphabet in English documents,
T is more common as the first letter of a word. This is why most commercial com-
pression utilities do not use Huffman coding as their primary coding method, but
instead use techniques that take advantage of the context for the letters.
Another factor that affects the compression efficiency of Huffman coding is the
relative frequencies of the letters. Some frequency patterns will save no space as
compared to fixed-length codes; others can result in great compression. In general,
Huffman coding does better when there is large variation in the frequencies of
letters.
In the particular case of the frequencies shown in Figure 5.31, we can
 
Search WWH ::




Custom Search