Compression - Disappearing Cryptography - page 74

Cryptography Reference

In-Depth Information

0 1

0 1

0 1

space

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

0 1

E

T

A

I

O

S

N

R

0 1

0 1

0 1

L

D

H

C

0 1

0 1

M

F

G

B

Figure 5.2: The top of the tree built from the data in Table 5.1. The

generated the codes shown in Table 5.2. Only the top part is shown

here because of space considerations. Less common letters like “Z”

are in the missing part of the tree corresponding to the prefix 0001.

set to be the sum of the old trees that were joined.

3. Repeat the previous step until there is only one tree left. The

I know of a Greek

labyrinth which is a

single straight line.

Along this line so many

philosophers have lost

themselves that a mere

detective might well do

so too.

—Jorge Luis Borges in

Death and the Compass

codes can be constructed by following the path between the

root and the leaves.

The characters with the smallest weights are joined together first.

Each joining process adds another layer between the root and the

leaves. So it is easy to see how the least common letters get pushed

far away from the root where they have a longer code word. The

most common letters aren't incorporated until the end, so they end

up near the top.

The algorithm naturally balances the tree by always taking the

smallest weights first. The weight for a tree represents the number of

times that any of the characters in the tree will occur in a file. You can

prove that the tree constructed by this algorithm is the best possible

tree by imagining what happens if you mistakenly choose the wrong

two trees to join at a step. More common characters get pushed

farther from the root and get longer code words than less common

characters do. The average compression drops.

Many other people have extended the theme of Huffman coding

by creating other algorithms that use the addresses of nodes in a

tree. One popular technique is to use Splay trees where the trees

are modified every time a character is encoded. One version moves

the letter to the top of the tree in a complex move that preserves

much of the structure. The result is that the most common letters

bubble up to the top. The constant rearrangement of the tree means

that the tree adapts to the local conditions. This type of algorithm

would be ideal for compressing a dictionary where even the least

popular letters like “j” or “z” are common in sections. When the

Next Page

Disappearing Cryptography

Search WWH ::

Custom Search

Home