Biomedical Engineering Reference
In-Depth Information
8.1. Algorithmic Complexity
Consider a collection of measured data-values, stored in digitized form on a
computer. We would like to say that they are complex if they are hard to de-
scribe, and measure their complexity by the difficulty of describing them. The
central idea of Kolmogorov complexity (proposed independently by Solomonoff
(176) and Chaitin) is that one can describe the data set by writing a program
which will reproduce the data. The difficulty of description is then measured by
the length of the program. Anyone with much experience of other people's code
will appreciate that it is always possible to write a longer, slower program to do
a given job, so what we are really interested in is the shortest program that can
exactly replicate the data.
To introduce some symbols, let x be the data, and | x | their size in bits. The
Kolmogorov or algorithmic complexity of x , K ( x ), is the length of the shortest
program that will output x and then stop. 21 Clearly, there is always some pro-
gram which will output x and then stop, for instance, " print ( x ); end ." Thus
K ( x ) | x | + c , where c is the length of the print and end instructions. This is what
one might call a literal description of the data. If one cannot do better than this—
if K ( x ) | x |—then x is highly complex. Some data, however, is highly com-
pressible. For instance, if x consists of the second quadrillion digits of Q, a very
short program suffices to generate it. 22
As you may already suspect, the number of simple data sets is quite limited.
Suppose we have a data set of size n bits, and we want to compress it by k bits,
i.e., find a program for it which is n - k bits long. There are at most 2 n-k programs
of that length, so of all the 2 n data sets of size n , the fraction that can be com-
pressed by k bits is at most 2 -k . The precise degree of compression does not mat-
ter—when we look at large data sets, almost all of them are highly complex. If
we pick a large data set at random , then the odds are very good that it will be
complex. We can state this more exactly if we think about our data as consisting
of the first n measurements from some sequence, and let n grow. That is, x = x 1 n ,
and we are interested in the asymptotic behavior of K ( x 1 n ). If the measurements
x i are independent and identically distributed (IID), then K ( x 1 n )/| x | 1 almost
surely; IID sequences are incompressibl e . If x is a realization of a stationary
(but not necessarily IID) random process X , then (177,10)
 
¯
n
KX
()
¡
°
lim
E
=
hX
(
)
,
[55]
¡
°
n
n
ld
¢
±
the entropy rate (§7) of X . Thus, random data has high complexity, and the
complexity of a random process grows at a rate that just measures its unpredict-
ability.
 
Search WWH ::




Custom Search