Compression - Disappearing Cryptography

Cryptography Reference

In-Depth Information

into them. Many image-compression functions are designed to be

lossy . That means that the reconstructed image may look very simi-

lar to the original image, but it won't be exactly the same. If the func-

tions that describe an image can be fitted more loosely, then the al-

gorithms can use fewer of them and produce a smaller compressed

output. For instance, an apple might be encoded as a blob of solid

red instead of a smooth continuum of various shades of red. When

the image is decompressed, much of the smaller detail is lost but

the overall picture still looks good. These compression functions can

easily compress an image to be one-fifth to one-tenth of its original

size.Thisiswhytheyaresopopular.

The television format

example from the

beginning of the chapter

is an example of lossy

compression. They are

not enough to recreate

the entire program.

They're a better example

of lossy compression

where a surrogate is

found.

5.2.1 Huffman Coding

A good way to understand basic compression is to examine a sim-

ple algorithm like Huffman coding. This technique analyzes the fre-

quency with which each letter occurs in a file and then replaces it

with a flexible-length code word. Normally, each letter is stored as

a byte which takes up 8 bits of information. Some estimates of the

entropy of standard English, though, show that it is something just

over about 3 bits per letter. Obviously there is room to squeeze up

to almost 5/8ths of a file of English text. The trick is to assign the

short code words to common letters and long code words to the least

common letters. Although some of the long words will end up being

longer than 8 bits, the net result will still be shorter. The common

letters will have the greatest effect.

Table 5.1 shows a table of the occurrences of letters in several dif-

ferent opinions from the United States Supreme Court. The space

is the most common character followed by the letter “E”. This table

was constructed by mixing lower- and uppercase letters for simplic-

ity. An actual compression function would keep separate entries for

each form as well as an entry for every type of punctuation mark. In

general, there would be 256 entries for each byte.

Table 5.2 shows a set of codes that were constructed for each letter

using the data in Table 5.1. The most common character, the space,

gets a code that is only 2 bits long: 01. Many of the other common

characters get codes that are 4 bits long. The least common charac-

ter, “Z”, gets an 11-bit code: 00011010001. If these codes were used

to encode data, then it should be easy to reduce a file to less than

one-half of its original size.

Here's a simple example that takes 48 bits used to store the word

“ARTHUR” in normal ASCII into 27 bits in compressed form:

Disappearing Cryptography

Search WWH ::

Custom Search

Home