Cryptography Reference
In-Depth Information
used to convert a piece of text into a much shorter one, which becomes illeg-
ible in the process. Another program can be used to retrieve the original text
from the compressed text at any given time. There is a large number of com-
pression methods, where the effect (how much a file will shrink) depends on
both the file contents and the method. The compress program implements a
well-known and very effective method based on the Ziv-Lempel algorithm .
compress used to be on every UNIX system, including FreeBSD. ( gzip , which
has become very popular lately, is based on the Deflate algorithm of zip ;it
cannot be used here — see Wikipedia entries on gzip .) It's extremely easy to
use compress . The example below is from a UNIX session (the number in front
of the month shows the file length in bytes).
$ ls -l vigc_crk.*
-rw-r--r--
1 wobst
other
5211 Nov
6 13:51 vigc_crk.c
$ compress -v vigc_crk.*
vigc_crk.c: Compression: 45.63% -- replaced by vigc_crk.c.Z
$ ls -l vigc_crk.*
-rw-r--r--
1 wobst
other
2833 Nov
6 13:51 vigc_crk.c.Z
$
How does this algorithm work? Most importantly, it replaces entire character
strings by numbers, which makes the compressed product short. This replace-
ment is done based on a table that stores character strings. At program start, the
table has 257 entries, namely all 256 one-element character strings formed from
all possible bytes, and an additional abortion code and, sometimes, an addi-
tional reset code (depending on the implementation). The table is expanded
in every step as the uncompressed text is read. If a character string that has
already been saved to the table is encountered in the text, then the algorithm
outputs the number of that table entry in its place. This is roughly how the
Ziv - Lempel algorithm works. If you are interested in the details, you'll find
an exact description in [Welch].
A little trick helps to save even more space: initially, the table includes less than
512 entries, i.e., the numbers of entries can be described using 9-bit numbers.
Once 256 9-bit words have been output, we can use 10-bit numbers; after
another 512 steps, we can use 11-bit numbers, and so on, till 16-bit numbers
are output. The program authors don't generally say what will happen once
16 bits have been exhausted (our example won't reach these spheres). This
sort of output with variable word length is called the adaptive Ziv-Lempel
Search WWH ::




Custom Search