Cryptography Reference
In-Depth Information
the dictionary. If the value of the byte,
B
bytes are copied verbatim out of the original file. This scheme allows
the program to use flexible word sizes that work well with English.
There are many different schemes that are more efficient that others
in some cases.
The index into the dictionary does not need to be
B
, is greater than zero, then
bit numbers.
You can also count the occurrence of words in the dictionary and use
a Huffman-like scheme to devise short code words for some of them.
The tag for verbatim text is usually included as just another word in
this case.
The dictionary can also adapt as the file is processed. One sim-
ple technique is to keep track of the last time a word from it was
used. Whenever a section of verbatim text is encountered, the oldest
word is swapped out of the dictionary and the newest verbatim text
is swapped in. This is a great technique for adapting to the text be-
cause many words are often clustered in sections. For instance, the
words “dictionary,” “Huffman,” and “compression” are common in
this section but relatively rare in other parts of the topic. An adaptive
scheme would load these words into the dictionary at the beginning
of the section when they were first encountered and not swap them
out until they aren't used for a while.
Dictionary schemes can be quite effective for compressing arbi-
trary text, but they are difficult to run in reverse to make data mimic
something. Chapter 6 uses Huffman-like algorithms to generate real
text, but it doesn't include a section on reversing dictionary algo-
rithms. They are described in this chapter because compression is a
good way to save space and whiten data. The algorithms don't work
particularly well for mimicry because they require a well-constructed
dictionary. Inpractice, there is no good automatic way that I know for
constructing a good one.
n
5.3.3 JPEG Compression
The Huffman encoding described in “Huffman Compression” (see
page 80) and the dictionary schemes in “Dictionary Compression”
(see page 82) are ideal for arbitrary collections of data. They can
also work quite well on some types of image files, but they fail on
others. If an image has a small number of colors that may occur in
a predictable pattern, then both of these algorithms may do a good
job of finding a pattern that is strong enough to generate a good
compression. This often doesn't happen because the images contain
many shades of colors. The Japanese flag, for instance, has one red
circle that is a constant color, but a realistically lit apple has many
Search WWH ::




Custom Search