Cryptography Reference
In-Depth Information
Letter:
A
R
T
H
U
R
ASCII:
01000001
01010010
01010100
01001000
01010101
01010010
Compressed:
1000
1111
0010
00111
000100
1111
The Huffman algorithm can also be used to compress any type of
data, but its effectiveness varies. For instance, it could be used on a
photograph where the intensity at each pixel is stored as a byte. The
algorithm would be very effective on a photograph that had only a
fewbasicvaluesofblackandwhite,butitwouldn'tworkwellifthe
intensities were evenly distributed in a photograph with many even
shades betweendark and light. The algorithmworks best when there
are a few basic values.
More sophisticated versions of the Huffman code exist. It is com-
mon to construct second-order codes that aggregate pairs of letters.
This can be done in two ways. The easiest way to do this is to sim-
ply treat each pair of letters as the basic atomic unit. Instead of con-
structing a frequency table of characters, youwould construct a table
of pairs. The table would be much larger, but it would generate even
better compression because many of the pairs would rarely occur.
Pairs like “ZF” are almost nonexistent.
Another way is to construct 26 different tables by analyzing which
letters follow other letters. One table for the letter “T” would hold
the frequency that the other letters might follow after the “T”. The
letter “H” would be quite common in this table because “TH” occurs
frequently in English. These 26 tables would produce even more
compression because more detailed analysis would tune the code
word evenmore. The letter “U” would receive a very short code word
after the letter “Q” because it invariably follows.
This example has shown how a Huffman compression function
works in practice. It didn't explain how the code words were con-
structed nor did it showwhy they worked so well. The next section in
this chapter will do that.
Chapter 6 shows how to
run Huffman codes in
reverse.
5.3 Building Compression Algorithms
Creating a new compression algorithm has been one of the more
lucrative areas of mathematics and computer science. A few smart
ideas are enough to save people billions of dollars of storage space
and communications time, and so many have worked with the idea
in depth. This chapter won't investigate the best work because it is
beyond the scope of the topic. Many of the easiest ideas turn out to
hide information the best. Huffman codes are a perfect solution for
basic text. Dictionary algorithms, like Lempel-Ziv, are less effective.
Search WWH ::




Custom Search