Cryptography Reference
In-Depth Information
0
1
0
1
1
0
A
S
W
01
E
T
Figure 5.1: A small Huffman tree. The code for each letter is deter-
mined by following the path between the root of the tree and the leaf
containing a particular letter. The letter “T”, for instance, receives the
code 110.
5.3.1 Huffman Compression
Huffman compression is easy to understand and construct. Let the
set of characters be Σ and let
ρ
(
c
) be the probability that a particular
character,
, occurs in a text file. Constructing such a frequency table
is easily done by analyzing a source file. It is usually done on a case-
by-case basis and is stored in the header to the compressed version,
but it can also be done in advance and used again and again.
The basic idea is to construct a binary tree that contains all of the
characters at the leaves. Each branche is labeled with either a zero or
a one. The path between the root and the leaf specifies the code used
for each letter. Figure 5.1 shows this for a small set of letters.
The key is to construct the tree so that the most common letters
occur near the top of the tree. This can be accomplished with a
relatively easy process:
c
1. Start with one node for each character. This node is also a
simple tree. The weight of this tree is set to be the probability
that the character associated with the tree occurs in the file.
Call the trees for
t i and the weight
w
(
n i ) .Thevalueof
i
changes
as the number of trees change.
2. Find the two trees with the smallest weight. Glue these into one
tree by constructing a new node with two branches connected
to the roots of the two trees. One branch will be labeled with a
one and the other will get a zero. The weight of this new tree is
Search WWH ::




Custom Search