Lossless Data Compression - Important Concepts in Signal Processing, Image Processing and Data Compression

Digital Signal Processing Reference

In-Depth Information

we are more likely to compress certain types of files than others; then the algorithm could

be designed to compress those types of data better.

Thus, the main lesson from the argument is not that one risks big losses, but merely that

one cannot always win. To choose an algorithm always means implicitly to select a

subset of all files that will become usefully shorter. This is the theoretical reason why we

need to have different compression algorithms for different kinds of files: there cannot be

any algorithm that is good for all kinds of data.

The "trick" that allows lossless compression algorithms, used on the type of data they

were designed for, to consistently compress such files to a shorter form is that the files

the algorithms are designed to act on all have some form of easily-modeled redundancy

that the algorithm is designed to remove, and thus belong to the subset of files that that

algorithm can make shorter, whereas other files would not get compressed or even get

bigger. Algorithms are generally quite specifically tuned to a particular type of file: for

example, lossless audio compression programs do not work well on text files, and vice

versa .

In particular, files of random data cannot be consistently compressed by any conceivable

lossless data compression algorithm: indeed, this result is used to define the concept of

randomness in algorithmic complexity theory.

An algorithm that is asserted to be able to losslessly compress any data stream is

provably impossible. While there have been many claims through the years of companies

achieving "perfect compression" where an arbitrary number N of random bits can always

be compressed to N-1 bits, these kinds of claims can be safely discarded without even

looking at any further details regarding the purported compression scheme. Such an

algorithm contradicts fundamental laws of mathematics because, if it existed, it could be

applied repeatedly to losslessly reduce any file to length 0. Allegedly "perfect"

compression algorithms are usually called derisively "magic" compression algorithms.

On the other hand, it has also been proven that there is no algorithm to determine whether

a file is incompressible in the sense of Kolmogorov complexity; hence, given any

particular file, even if it appears random, it's possible that it may be significantly

compressed, even including the size of the decompressor. An example is the digits of the

mathematical constant pi , which appear random but can be generated by a very small

program. However, even though it cannot be determined whether a particular file is

incompressible, a simple theorem about incompressible strings shows that over 99% of

files of any given length cannot be compressed by more than one byte (including the size

of the decompressor).

Mathematical background

Any compression algorithm can be viewed as a function that maps sequences of units

(normally octets) into other sequences of the same units. Compression is successful if the

resulting sequence is shorter than the original sequence plus the map needed to

Search WWH ::

Custom Search

Home