Databases Reference
In-Depth Information
GIF and V.42 bis. While the LZW algorithm was initially the algorithm of choice, patent
concerns led to increasing use of the LZ77 algorithm. The most popular implementation of
the LZ77 algorithm is the deflate algorithm initially designed by Phil Katz. It is part of the
popular zlib library developed by Jean-Loup Gailly and Mark Adler. Jean-Loup Gailly also
used deflate in the widely used gzip algorithm. The deflate algorithm is also used in PNG,
which we describe below.
5.5.1 File Compression—UNIX compress
The UNIX compress command is one of the earlier applications of LZW. The size of the
dictionary is adaptive. We start with a dictionary of size 512. This means that the transmitted
codewords are 9 bits long. Once the dictionary has filled up, the size of the dictionary is
doubled to 1024 entries. The codewords transmitted at this point have 10 bits. The size of
the dictionary is progressively doubled as it fills up. In this way, during the earlier part of the
coding process when the strings in the dictionary are not very long, the codewords used to
encode them also have fewer bits. The maximum size of the codeword, b max , can be set by the
user to between 9 and 16, with 16 bits being the default. Once the dictionary contains 2 b max
entries, compress becomes a static dictionary coding technique. At this point the algorithm
monitors the compression ratio. If the compression ratio falls below a threshold, the dictionary
is flushed, and the dictionary building process is restarted. This way, the dictionary always
reflects the local characteristics of the source.
5.5.2 Image Compression—The Graphics Interchange
Format (GIF)
The Graphics Interchange Format (GIF) was developed by CompuServe Information Service
to encode graphical images. It is another implementation of the LZW algorithm and is very
similar to the compress command. The compressed image is stored with the first byte being
the minimum number of bits b per pixel in the original image. For the images we have been
using as examples, this would be eight. The binary number 2 b is defined to be the clear
code . This code is used to reset all compression and decompression parameters to a start-up
state. The initial size of the dictionary is 2 b + 1 . When this fills up, the dictionary size is
doubled, as was done in the compress algorithm, until the maximum dictionary size of 4096
is reached. At this point, the compression algorithm behaves like a static dictionary algorithm.
The codewords from the LZW algorithm are stored in blocks of characters. The characters
are 8 bits long, and the maximum block size is 255. Each block is preceded by a header that
contains the block size. The block is terminated by a block terminator consisting of eight 0s.
The end of the compressed image is denoted by an end-of-information code with a value of
2 b
1. This codeword should appear before the block terminator.
GIF has become quite popular for encoding all kinds of images, both computer-generated
and “natural” images. While GIF works well with computer-generated graphical images and
pseudocolor or color-mapped images, it is generally not the most efficient way to losslessly
compress images of natural scenes, photographs, satellite images, and so on. In Table 5.16 ,
we give the file sizes for the GIF-encoded test images. For comparison, we also include the
file sizes for arithmetic coding the original images and arithmetic coding the differences.
+
Search WWH ::




Custom Search