Cryptography Reference
In-Depth Information
0 1
0 1
0 1
space
0 1
0 1
0 1
0 1
0 1
0 1
0 1
0 1
0 1
E
T
A
I
O
S
N
R
0 1
0 1
0 1
L
D
H
C
0 1
0 1
M
F
G
B
Figure 5.2: The top of the tree built from the data in Table 5.1. The
generated the codes shown in Table 5.2. Only the top part is shown
here because of space considerations. Less common letters like “Z”
are in the missing part of the tree corresponding to the prefix 0001.
set to be the sum of the old trees that were joined.
3. Repeat the previous step until there is only one tree left. The
I know of a Greek
labyrinth which is a
single straight line.
Along this line so many
philosophers have lost
themselves that a mere
detective might well do
so too.
—Jorge Luis Borges in
Death and the Compass
codes can be constructed by following the path between the
root and the leaves.
The characters with the smallest weights are joined together first.
Each joining process adds another layer between the root and the
leaves. So it is easy to see how the least common letters get pushed
far away from the root where they have a longer code word. The
most common letters aren't incorporated until the end, so they end
up near the top.
The algorithm naturally balances the tree by always taking the
smallest weights first. The weight for a tree represents the number of
times that any of the characters in the tree will occur in a file. You can
prove that the tree constructed by this algorithm is the best possible
tree by imagining what happens if you mistakenly choose the wrong
two trees to join at a step. More common characters get pushed
farther from the root and get longer code words than less common
characters do. The average compression drops.
Many other people have extended the theme of Huffman coding
by creating other algorithms that use the addresses of nodes in a
tree. One popular technique is to use Splay trees where the trees
are modified every time a character is encoded. One version moves
the letter to the top of the tree in a complex move that preserves
much of the structure. The result is that the most common letters
bubble up to the top. The constant rearrangement of the tree means
that the tree adapts to the local conditions. This type of algorithm
would be ideal for compressing a dictionary where even the least
popular letters like “j” or “z” are common in sections. When the
Search WWH ::




Custom Search