Databases Reference
In-Depth Information
Therefore, if we have a uniquely decodable code, the codeword lengths have to satisfy
the Kraft-McMillan inequality. And, given codeword lengths that satisfy the Kraft-McMillan
inequality, we can always find a prefix code with those codeword lengths. Thus, by restricting
ourselves to prefix codes, we are not in danger of overlooking nonprefix uniquely decodable
codes that have a shorter average length.
2.5 Algorithmic Information Theory
The theory of information described in the previous sections is intuitively satisfying and has
useful applications. However, when dealing with real world data, it does have some theoretical
difficulties. Suppose you were given the task of developing a compression scheme for use with
a specific set of documents. We can view the entire set as a single long string. You could
develop models for the data. Based on these models you could calculate probabilities using the
relative frequency approach. These probabilities could then be used to obtain an estimate of the
entropy and, thus, an estimate of the amount of compression available. All is well except for a
“fly in the ointment.” The string you have been given is fixed. There is nothing probabilistic
about it. There is no abstract source that will generate different sets of these specific documents
at different times. So how can we talk about the entropies without pretending that reality is
somehow different from what it actually is? Unfortunately, it is not clear that we can. Our
definition of entropy requires the existence of an abstract source. Our estimate of the entropy
is still useful. It will give us a very good idea of how much compression we can get. So,
practically speaking, information theory comes through. However, theoretically it seems there
is some pretending involved. Algorithmic information theory is a different way of looking at
information that has not been as useful in practice (and, therefore, we will not be looking at it a
whole lot) but it gets around this theoretical problem. At the heart of algorithmic information
theory is a measure called Kolmogorov complexity . This measure, while it bears the name of
one person, was actually discovered independently by three people: R. Solomonoff, who was
exploring machine learning; the Russian mathematician A.N. Kolmogorov; and G. Chaitin,
who was in high school when he came up with this idea.
The Kolmogorov complexity K
of a sequence x is the size of the program needed to
generate x . In this size we include all inputs that might be needed by the program. We do not
specify the programming language because it is always possible to translate a program in one
language to a program in another language at fixed cost. If x was a sequence of all ones, a
highly compressible sequence, the program would simply be a print statement in a loop. On
the other extreme, if x were a random sequence with no structure, then the only program that
could generate it would contain the sequence itself. The size of the program would be slightly
larger than the sequence itself. Thus, there is a clear correspondence between the size of the
smallest program that can generate a sequence and the amount of compression that can be
obtained. Kolmogorov complexity seems to be the ideal measure to use in data compression.
The problem is we do not know of any systematic way of computing or closely approximating
Kolmogorov complexity. Clearly, any program that can generate a particular sequence is an
upper bound for the Kolmogorov complexity of the sequence. However, we have no way
of determining a lower bound. Thus, while the notion of Kolmogorov complexity is more
(
x
)
Search WWH ::




Custom Search