Mathematical Preliminaries for Lossless Compression - Introduction to Data Compression

Databases Reference

In-Depth Information

Therefore, if we have a uniquely decodable code, the codeword lengths have to satisfy

the Kraft-McMillan inequality. And, given codeword lengths that satisfy the Kraft-McMillan

inequality, we can always find a prefix code with those codeword lengths. Thus, by restricting

ourselves to prefix codes, we are not in danger of overlooking nonprefix uniquely decodable

codes that have a shorter average length.

2.5 Algorithmic Information Theory

The theory of information described in the previous sections is intuitively satisfying and has

useful applications. However, when dealing with real world data, it does have some theoretical

difficulties. Suppose you were given the task of developing a compression scheme for use with

a specific set of documents. We can view the entire set as a single long string. You could

develop models for the data. Based on these models you could calculate probabilities using the

relative frequency approach. These probabilities could then be used to obtain an estimate of the

entropy and, thus, an estimate of the amount of compression available. All is well except for a

“fly in the ointment.” The string you have been given is fixed. There is nothing probabilistic

about it. There is no abstract source that will generate different sets of these specific documents

at different times. So how can we talk about the entropies without pretending that reality is

somehow different from what it actually is? Unfortunately, it is not clear that we can. Our

definition of entropy requires the existence of an abstract source. Our estimate of the entropy

is still useful. It will give us a very good idea of how much compression we can get. So,

practically speaking, information theory comes through. However, theoretically it seems there

is some pretending involved. Algorithmic information theory is a different way of looking at

information that has not been as useful in practice (and, therefore, we will not be looking at it a

whole lot) but it gets around this theoretical problem. At the heart of algorithmic information

theory is a measure called Kolmogorov complexity . This measure, while it bears the name of

one person, was actually discovered independently by three people: R. Solomonoff, who was

exploring machine learning; the Russian mathematician A.N. Kolmogorov; and G. Chaitin,

who was in high school when he came up with this idea.

The Kolmogorov complexity K

of a sequence x is the size of the program needed to

generate x . In this size we include all inputs that might be needed by the program. We do not

specify the programming language because it is always possible to translate a program in one

language to a program in another language at fixed cost. If x was a sequence of all ones, a

highly compressible sequence, the program would simply be a print statement in a loop. On

the other extreme, if x were a random sequence with no structure, then the only program that

could generate it would contain the sequence itself. The size of the program would be slightly

larger than the sequence itself. Thus, there is a clear correspondence between the size of the

smallest program that can generate a sequence and the amount of compression that can be

obtained. Kolmogorov complexity seems to be the ideal measure to use in data compression.

The problem is we do not know of any systematic way of computing or closely approximating

Kolmogorov complexity. Clearly, any program that can generate a particular sequence is an

upper bound for the Kolmogorov complexity of the sequence. However, we have no way

of determining a lower bound. Thus, while the notion of Kolmogorov complexity is more

(

x

)

Introduction to Data Compression

Search WWH ::

Custom Search

Home