Java Reference
In-Depth Information
The general strategy is to allow the code length to vary from character to
character and to ensure that frequently occurring characters have short codes.
If all characters occur with the same or very similar frequency, you cannot
expect any savings.
In a variable-length
code, the most-
frequent characters
have the shortest
representation.
12.1.1 prefix codes
The binary code presented in Figure 12.1 can be represented by the binary
tree shown in Figure 12.2. In this data structure, called a binary trie (pro-
nounced “try”), characters are stored only in leaf nodes; the representation of
each character is found by starting at the root and recording the path, using a 0
to indicate the left branch and a 1 to indicate the right branch. For instance, s
is reached by going left, then right, and finally right. This is encoded as 011. If
character is at depth and occurs f i times, the cost of the code is d i f i .
We can obtain a better code than the one given in Figure 12.2 by recog-
nizing that nl is an only child. By placing it one level higher (replacing its par-
ent), we obtain the new tree shown in Figure 12.3. This new tree has a cost of
173 but is still far from optimal.
Note that the tree in Figure 12.3 is a full tree, in which all nodes either are
leaves or have two children. An optimal code always has this property; other-
wise, as already shown, nodes with only one child could move up a level. If
the characters are placed only at the leaves, any sequence of bits can always
be decoded unambiguously.
In a binary trie , a left
branch represents
0 and a right
branch represents 1.
The path to a node
indicates its
representation.
c i
d i
In a full tree , all
nodes either are
leaves or have two
children.
figure 12.2
Representation of the
original code by a tree
a
e
i
s
t
sp
nl
figure 12.3
A slightly better tree
nl
a
e
i
s
t
sp
 
Search WWH ::




Custom Search