Dictionary Techniques - Introduction to Data Compression

Databases Reference

In-Depth Information

GIF and V.42 bis. While the LZW algorithm was initially the algorithm of choice, patent

concerns led to increasing use of the LZ77 algorithm. The most popular implementation of

the LZ77 algorithm is the deflate algorithm initially designed by Phil Katz. It is part of the

popular zlib library developed by Jean-Loup Gailly and Mark Adler. Jean-Loup Gailly also

used deflate in the widely used gzip algorithm. The deflate algorithm is also used in PNG,

which we describe below.

5.5.1 File Compression—UNIX compress

The UNIX compress command is one of the earlier applications of LZW. The size of the

dictionary is adaptive. We start with a dictionary of size 512. This means that the transmitted

codewords are 9 bits long. Once the dictionary has filled up, the size of the dictionary is

doubled to 1024 entries. The codewords transmitted at this point have 10 bits. The size of

the dictionary is progressively doubled as it fills up. In this way, during the earlier part of the

coding process when the strings in the dictionary are not very long, the codewords used to

encode them also have fewer bits. The maximum size of the codeword, b max , can be set by the

user to between 9 and 16, with 16 bits being the default. Once the dictionary contains 2 b max

entries, compress becomes a static dictionary coding technique. At this point the algorithm

monitors the compression ratio. If the compression ratio falls below a threshold, the dictionary

is flushed, and the dictionary building process is restarted. This way, the dictionary always

reflects the local characteristics of the source.

5.5.2 Image Compression—The Graphics Interchange

Format (GIF)

The Graphics Interchange Format (GIF) was developed by CompuServe Information Service

to encode graphical images. It is another implementation of the LZW algorithm and is very

similar to the compress command. The compressed image is stored with the first byte being

the minimum number of bits b per pixel in the original image. For the images we have been

using as examples, this would be eight. The binary number 2 b is defined to be the clear

code . This code is used to reset all compression and decompression parameters to a start-up

state. The initial size of the dictionary is 2 b + 1 . When this fills up, the dictionary size is

doubled, as was done in the compress algorithm, until the maximum dictionary size of 4096

is reached. At this point, the compression algorithm behaves like a static dictionary algorithm.

The codewords from the LZW algorithm are stored in blocks of characters. The characters

are 8 bits long, and the maximum block size is 255. Each block is preceded by a header that

contains the block size. The block is terminated by a block terminator consisting of eight 0s.

The end of the compressed image is denoted by an end-of-information code with a value of

2 b

1. This codeword should appear before the block terminator.

GIF has become quite popular for encoding all kinds of images, both computer-generated

and “natural” images. While GIF works well with computer-generated graphical images and

pseudocolor or color-mapped images, it is generally not the most efficient way to losslessly

compress images of natural scenes, photographs, satellite images, and so on. In Table 5.16 ,

we give the file sizes for the GIF-encoded test images. For comparison, we also include the

file sizes for arithmetic coding the original images and arithmetic coding the differences.

+

Introduction to Data Compression

Search WWH ::

Custom Search

Home