Databases Reference
In-Depth Information
It is easy to see that the first observation is correct. If symbols that occur more often had
codewords that were longer than the codewords for symbols that occurred less often, the
average number of bits per symbol would be larger than if the conditions were reversed.
Therefore, a code that assigns longer codewords to symbols that occur more frequently cannot
be optimum.
To see why the second observation holds true, consider the following situation. Suppose
an optimum code
exists in which the two codewords corresponding to the two least probable
symbols do not have the same length. Suppose the longer codeword is k bits longer than the
shorter codeword. Because this is a prefix code, the shorter codeword cannot be a prefix of the
longer codeword. This means that even if we drop the last k bits of the longer codeword, the
two codewords would still be distinct. As these codewords correspond to the least probable
symbols in the alphabet, no other codeword can be longer than these codewords; therefore, there
is no danger that the shortened codeword would become the prefix of some other codeword.
Furthermore, by dropping these k bits we obtain a new code that has a shorter average length
than
C
C
. But this violates our initial contention that
C
is an optimal code. Therefore, for an
optimal code the second observation also holds true.
The Huffman procedure adds a simple requirement to these two observations. This re-
quirement is that the codewords corresponding to the two lowest probability symbols differ
only in the last bit. Assume that
γ
and
δ
are the two least probable symbols in an alphabet. If
the codeword for
γ
is m
0, the codeword for
δ
would be m
1. Here, m is a string of 1s and
0s, and
denotes concatenation.
This requirement does not violate our two observations and leads to a very simple encoding
procedure. We describe this procedure with the help of the following example.
Examp l e 3 . 2 . 1 : Design of a Huffman Code
Let us design a Huffman code for a source that puts out letters from an alphabet
A =
{
1.
The entropy for this source is 2.122 bits/symbol. To design the Huffman code, we first sort
the letters in a descending probability order as shown in Table 3.1 .Here c
a 1 ,
a 2 ,
a 3 ,
a 4 ,
a 5 }
with P
(
a 1 ) =
P
(
a 3 ) =
0
.
2, P
(
a 2 ) =
0
.
4, and P
(
a 4 ) =
P
(
a 5 ) =
0
.
(
a i )
denotes the
codeword for a i .
The two symbols with the lowest probability are a 4 and a 5 . Therefore, we can assign their
codewords as
T A B L E 3 . 1
The initial five-letter alphabet.
Letter
Probability
Codeword
a 2
0 . 4
c ( a 2 )
a 1
0 . 2
c ( a 1 )
a 3
0 . 2
c ( a 3 )
a 4
0 . 1
c ( a 4 )
a 5
0 . 1
c ( a 5 )
 
Search WWH ::




Custom Search