Compression - Disappearing Cryptography

Cryptography Reference

In-Depth Information

algorithm moved through the “j” part of the dictionary, the node

containing “j” would be pushed repeatedly to the top of the splay tree

where it would get a short code word. Later, when the compression

function got to the “z” section, the node for “z” would end up near the

top consistently giving “z” a short code word. Obviously one major

problem with this compression scheme is that the entire file must

be processed from the beginning to keep an accurate description of

the splay tree. You can't simply jump to the “z” section and begin

decompressing.

A good basic reference

on compression is

[Sto88].

This basic Huffman algorithm has many different uses. It will be

in Chapter 6 to turn data into something that looks like English text.

Huffman encoding is also used as a building block in Chapter 7 to

make optimal weighted choices between different words. The same

structureisasusefulthereasitishere.

5.3.2 Dictionary Compression

Compression schemes like the popular and patented Lempel-Ziv al-

gorithm are called dictionary schemes because they build a big list

of common words in the file. 2 This list can either be created in one

swoop at the beginning of the compression or it could be built and

changed adaptively as the algorithm processes the file. The algo-

rithms succeed because a pointer describing the position in the dic-

tionary takes up much less space than the common word itself.

The dictionary is just a list of words. It is almost always 2 n words

because that makes the pointer to a particular word take up

bits.

Each word can either be a fixed length or a flexible length. Fixed

lengths are easier to handle, but flexible lengths do a better job of

approximating the English language and x86 machine code.

Compression follows easily. First, the file is analyzed to create a

list of the 2 n most commonwords. Then the file is processed by scan-

ning from beginning to end. If the current word is in the dictionary,

then it is replaced by a tag, <InDict> , followed by the position in the

dictionary. If it isn't in the dictionary, then it is replaced by a tag,

<Verbatim> , followed by the word that remains unchanged.

Obviously, the success of the algorithm depends on the size of the

tags ( <InDict> and <Verbatim> ), the size of the dictionary, and the

number of times something is found in the dictionary. One basic and

usually effective solution is to make the tags be one entire byte,

n

B

.If

the value of the byte is zero, then the next

n

bits represents a word in

2 The algorithms are not particularly good at compressing files like dictionaries

used by humans. The fact that I used a regular dictionary as an example in the previous

section is just a coincidence. Don't be confused.

Disappearing Cryptography

Search WWH ::

Custom Search

Home