Cryptography Reference
In-Depth Information
into them. Many image-compression functions are designed to be
lossy . That means that the reconstructed image may look very simi-
lar to the original image, but it won't be exactly the same. If the func-
tions that describe an image can be fitted more loosely, then the al-
gorithms can use fewer of them and produce a smaller compressed
output. For instance, an apple might be encoded as a blob of solid
red instead of a smooth continuum of various shades of red. When
the image is decompressed, much of the smaller detail is lost but
the overall picture still looks good. These compression functions can
easily compress an image to be one-fifth to one-tenth of its original
size.Thisiswhytheyaresopopular.
The television format
example from the
beginning of the chapter
is an example of lossy
compression. They are
not enough to recreate
the entire program.
They're a better example
of lossy compression
where a surrogate is
found.
5.2.1 Huffman Coding
A good way to understand basic compression is to examine a sim-
ple algorithm like Huffman coding. This technique analyzes the fre-
quency with which each letter occurs in a file and then replaces it
with a flexible-length code word. Normally, each letter is stored as
a byte which takes up 8 bits of information. Some estimates of the
entropy of standard English, though, show that it is something just
over about 3 bits per letter. Obviously there is room to squeeze up
to almost 5/8ths of a file of English text. The trick is to assign the
short code words to common letters and long code words to the least
common letters. Although some of the long words will end up being
longer than 8 bits, the net result will still be shorter. The common
letters will have the greatest effect.
Table 5.1 shows a table of the occurrences of letters in several dif-
ferent opinions from the United States Supreme Court. The space
is the most common character followed by the letter “E”. This table
was constructed by mixing lower- and uppercase letters for simplic-
ity. An actual compression function would keep separate entries for
each form as well as an entry for every type of punctuation mark. In
general, there would be 256 entries for each byte.
Table 5.2 shows a set of codes that were constructed for each letter
using the data in Table 5.1. The most common character, the space,
gets a code that is only 2 bits long: 01. Many of the other common
characters get codes that are 4 bits long. The least common charac-
ter, “Z”, gets an 11-bit code: 00011010001. If these codes were used
to encode data, then it should be easy to reduce a file to less than
one-half of its original size.
Here's a simple example that takes 48 bits used to store the word
“ARTHUR” in normal ASCII into 27 bits in compressed form:
Search WWH ::




Custom Search