Compression - Disappearing Cryptography

Cryptography Reference

In-Depth Information

the dictionary. If the value of the byte,

B

bytes are copied verbatim out of the original file. This scheme allows

the program to use flexible word sizes that work well with English.

There are many different schemes that are more efficient that others

in some cases.

The index into the dictionary does not need to be

B

, is greater than zero, then

bit numbers.

You can also count the occurrence of words in the dictionary and use

a Huffman-like scheme to devise short code words for some of them.

The tag for verbatim text is usually included as just another word in

this case.

The dictionary can also adapt as the file is processed. One sim-

ple technique is to keep track of the last time a word from it was

used. Whenever a section of verbatim text is encountered, the oldest

word is swapped out of the dictionary and the newest verbatim text

is swapped in. This is a great technique for adapting to the text be-

cause many words are often clustered in sections. For instance, the

words “dictionary,” “Huffman,” and “compression” are common in

this section but relatively rare in other parts of the topic. An adaptive

scheme would load these words into the dictionary at the beginning

of the section when they were first encountered and not swap them

out until they aren't used for a while.

Dictionary schemes can be quite effective for compressing arbi-

trary text, but they are difficult to run in reverse to make data mimic

something. Chapter 6 uses Huffman-like algorithms to generate real

text, but it doesn't include a section on reversing dictionary algo-

rithms. They are described in this chapter because compression is a

good way to save space and whiten data. The algorithms don't work

particularly well for mimicry because they require a well-constructed

dictionary. Inpractice, there is no good automatic way that I know for

constructing a good one.

n

5.3.3 JPEG Compression

The Huffman encoding described in “Huffman Compression” (see

page 80) and the dictionary schemes in “Dictionary Compression”

(see page 82) are ideal for arbitrary collections of data. They can

also work quite well on some types of image files, but they fail on

others. If an image has a small number of colors that may occur in

a predictable pattern, then both of these algorithms may do a good

job of finding a pattern that is strong enough to generate a good

compression. This often doesn't happen because the images contain

many shades of colors. The Japanese flag, for instance, has one red

circle that is a constant color, but a realistically lit apple has many

Disappearing Cryptography

Search WWH ::

Custom Search

Home