Digital Signal Processing Reference
In-Depth Information
we are more likely to compress certain types of files than others; then the algorithm could
be designed to compress those types of data better.
Thus, the main lesson from the argument is not that one risks big losses, but merely that
one cannot always win. To choose an algorithm always means implicitly to select a
subset of all files that will become usefully shorter. This is the theoretical reason why we
need to have different compression algorithms for different kinds of files: there cannot be
any algorithm that is good for all kinds of data.
The "trick" that allows lossless compression algorithms, used on the type of data they
were designed for, to consistently compress such files to a shorter form is that the files
the algorithms are designed to act on all have some form of easily-modeled redundancy
that the algorithm is designed to remove, and thus belong to the subset of files that that
algorithm can make shorter, whereas other files would not get compressed or even get
bigger. Algorithms are generally quite specifically tuned to a particular type of file: for
example, lossless audio compression programs do not work well on text files, and vice
versa .
In particular, files of random data cannot be consistently compressed by any conceivable
lossless data compression algorithm: indeed, this result is used to define the concept of
randomness in algorithmic complexity theory.
An algorithm that is asserted to be able to losslessly compress any data stream is
provably impossible. While there have been many claims through the years of companies
achieving "perfect compression" where an arbitrary number N of random bits can always
be compressed to N-1 bits, these kinds of claims can be safely discarded without even
looking at any further details regarding the purported compression scheme. Such an
algorithm contradicts fundamental laws of mathematics because, if it existed, it could be
applied repeatedly to losslessly reduce any file to length 0. Allegedly "perfect"
compression algorithms are usually called derisively "magic" compression algorithms.
On the other hand, it has also been proven that there is no algorithm to determine whether
a file is incompressible in the sense of Kolmogorov complexity; hence, given any
particular file, even if it appears random, it's possible that it may be significantly
compressed, even including the size of the decompressor. An example is the digits of the
mathematical constant pi , which appear random but can be generated by a very small
program. However, even though it cannot be determined whether a particular file is
incompressible, a simple theorem about incompressible strings shows that over 99% of
files of any given length cannot be compressed by more than one byte (including the size
of the decompressor).
Mathematical background
Any compression algorithm can be viewed as a function that maps sequences of units
(normally octets) into other sequences of the same units. Compression is successful if the
resulting sequence is shorter than the original sequence plus the map needed to
Search WWH ::




Custom Search