Databases Reference
In-Depth Information
When the numbers of entries in the dictionary exceed a prearranged threshold C 3 ,the
encoder sends the STEPUP control code, and the codeword size is incremented by 1 bit. At
the same time, the threshold C 3 is also doubled. When all available dictionary entries are filled,
the algorithm initiates a reuse procedure. The location of the first string entry in the dictionary
is maintained in a variable N 5 . Starting from N 5 , a counter C 1 is incremented until it finds a
dictionary entry that is not a prefix to any other dictionary entry. The fact that this entry is not
a prefix to another dictionary entry means that this pattern has not been encountered since it
was created. Furthermore, because of the way it was located, this pattern has been around the
longest among patterns of this kind. This reuse procedure enables the algorithm to prune the
dictionary of strings that may have been encountered in the past but have not been encountered
recently, on a continual basis. In this way, the dictionary is always matched to the current
source statistics.
To reduce the effect of errors, the CCITT recommends setting a maximum string length.
This maximum length is negotiated at link setup. The CCITT recommends a range of 6-250,
with a default value of 6.
The V.42 bis recommendation avoids the need for an exception handler for the case where
the decoder receives a codeword corresponding to an incomplete entry by forbidding the use
of the last entry in the dictionary. Instead of transmitting the codeword corresponding to
the last entry, the recommendation requires the sending of the codewords corresponding to
the constituents of the last entry. In the example used to demonstrate this quirk of the LZW
algorithm, the V.42 bis recommendation would have forced us to send the codewords 3 and 1
instead of transmitting the codeword 5.
5.6 Beyond Compression __ Lempel-Ziv
Complexity
In Chapter 2, we described the notion of Kolmogorov complexity, noting that the Kolmogorov
complexity of a sequence was not something that we could necessarily compute. Lempel and
Ziv provided a definition of complexity in 1976 [ 57 ] that is computable. This notion views
complexity as the level of diversity in a sequence. Thus, a sequence that consists of repetitions
of the same letter would not be a complex sequence, while a sequence that appears random
would be complex. The development of this definition of complexity formed the basis for
the two compression approaches that were to follow in the next two years. Since then, this
definition of complexity has had an impact in areas other than compression. We will take a brief
detour into these applications as they confirm our assertion that data compression algorithms
are ways of modeling and understanding how information is organized in data.
The computation of Lempel-Ziv (LZ) complexity is performed in a manner very similar
to the LZ77 algorithm. The basic idea is to parse the input sequence into phrases that do not
exist in the past of the sequence. Thus, a phrase is built up by copying from the past of the
sequence until we add a letter that makes the phrase distinct from anything in the past of the
sequence. Let's see how this works by using a familiar phrase, cabracadabrarrarrad .We
begin with an empty history so the first letter c is a unique phrase, as are the letters a , b , and r .
The next letter in the sequence is a , which has occurred in the past, so we add the next letter
Search WWH ::




Custom Search