Databases Reference
In-Depth Information
T A B L E 3 . 12
Huffman code for
the alphabet
.
A
Letter
Codeword
a 1
00
a 2
11
a 3
10
Example3.2.4:
Consider a source that puts out iid letters from the alphabet
A ={
a 1 ,
a 2 ,
a 3 }
with the probability
model P
18. The entropy for this source is 0.816
bits/symbol. A Huffman code for this source is shown in Table 3.12 .
The average length for this code is 1
(
a 1 ) =
0
.
8, P
(
a 2 ) =
0
.
02, and P
(
a 3 ) =
0
.
2 bits/symbol. The difference between the average
code length and the entropy, or the redundancy, for this code is 0.384 bits/symbol, which is
47% of the entropy. This means that to code this sequence, we would need 47%more bits than
the minimum required.
.
We can sometimes reduce the coding rate by blocking more than one symbol together. To
see how this can happen, consider a source S that emits a sequence of letters from an alphabet
A ={
. Each element of the sequence is generated independently of the other
elements in the sequence. The entropy for this source is given by
a 1 ,
a 2 ,...,
a m }
m
H
(
S
) =−
P
(
a i )
log 2 P
(
a i )
i = 1
We know that we can generate a Huffman code for this source with rate R such that
H
1 (5)
We have used the looser bound here; the same argument can be made with the tighter bound.
Notice that we have used “rate R ” to denote the number of bits per symbol. This is a standard
convention in the data compression literature. However, in the communication literature, the
word “rate” often refers to the number of bits per second.
Suppose we now encode the sequence by generating one codeword for every n symbols.
As there are m n combinations of n symbols, we will need m n codewords in our Huffman code.
We could generate this code by viewing the m n symbols as letters of an extended alphabet
(
S
)
R
<
H
(
S
) +
n tim es
A ( n ) ={
a m }
from a source S ( n ) . Let us denote the rate for the new source as R ( n ) . Then we know that
a 1 a 1 ...
a 1 ,
a 1 a 1 ...
a 2 ,...,
a 1 a 1 ...
a m ,
a 1 a 1 ...
a 2 a 1 ,...,
a m a m ...
S ( n ) )
R ( n ) <
S ( n ) ) +
1 (6)
R ( n ) is the number of bits required to code n symbols. Therefore, the number of bits required
per symbol, R , is given by
H
(
H
(
1
n R ( n )
R
=
 
Search WWH ::




Custom Search