Java Reference
In-Depth Information
For instance, suppose that the encoded string is 010011110001011000
1000111. Figure 12.3 shows that 0 and 01 are not character codes but that 010
represents i, so the first character is i. Then 011 follows, which is an s. Then
11 follows, which is a newline ( nl ). The remainder of the code is a, sp, t, i, e,
and nl .
The character codes can be different lengths, as long as no character code
is a prefix of another character code, an encoding called a prefix code. Con-
versely, if a character is contained in a nonleaf node, guaranteeing unambigu-
ous decoding is no longer possible.
Thus our basic problem is to find the full binary tree of minimum cost (as
defined previously) in which all characters are contained in the leaves. The
tree shown in Figure 12.4 is optimal for our sample alphabet. As shown in
Figure 12.5, this code requires only 146 bits. There are many optimal codes,
which can be obtained by swapping children in the encoding tree.
In a prefix code , no
character code is a
prefix of another
character code.
This is guaranteed
if the characters
are only in leaves. A
prefix code can
be decoded
unambiguously.
figure 12.4
An optimal prefix
code tree
i
sp
e
a
t
s
nl
figure 12.5
Optimal prefix code
Character
Code
Frequency
Total Bits
a
001
10
30
e
01
15
30
i
10
12
24
s
00000
3
15
t
0001
4
16
sp
11
13
26
nl
00001
1
5
Total
146
 
 
Search WWH ::




Custom Search