METHODS AND TECHNIQUES OF COMPLEX SYSTEMS SCIENCE: AN OVERVIEW - Complex Systems Science in Biomedicine

Biomedical Engineering Reference

In-Depth Information

8.1. Algorithmic Complexity

Consider a collection of measured data-values, stored in digitized form on a

computer. We would like to say that they are complex if they are hard to de-

scribe, and measure their complexity by the difficulty of describing them. The

central idea of Kolmogorov complexity (proposed independently by Solomonoff

(176) and Chaitin) is that one can describe the data set by writing a program

which will reproduce the data. The difficulty of description is then measured by

the length of the program. Anyone with much experience of other people's code

will appreciate that it is always possible to write a longer, slower program to do

a given job, so what we are really interested in is the shortest program that can

exactly replicate the data.

To introduce some symbols, let x be the data, and | x | their size in bits. The

Kolmogorov or algorithmic complexity of x , K ( x ), is the length of the shortest

program that will output x and then stop. 21 Clearly, there is always some pro-

gram which will output x and then stop, for instance, " print ( x ); end ." Thus

K ( x ) | x | + c , where c is the length of the print and end instructions. This is what

one might call a literal description of the data. If one cannot do better than this—

if K ( x ) | x |—then x is highly complex. Some data, however, is highly com-

pressible. For instance, if x consists of the second quadrillion digits of Q, a very

short program suffices to generate it. 22

As you may already suspect, the number of simple data sets is quite limited.

Suppose we have a data set of size n bits, and we want to compress it by k bits,

i.e., find a program for it which is n - k bits long. There are at most 2 n-k programs

of that length, so of all the 2 n data sets of size n , the fraction that can be com-

pressed by k bits is at most 2 -k . The precise degree of compression does not mat-

ter—when we look at large data sets, almost all of them are highly complex. If

we pick a large data set at random , then the odds are very good that it will be

complex. We can state this more exactly if we think about our data as consisting

of the first n measurements from some sequence, and let n grow. That is, x = x 1 n ,

and we are interested in the asymptotic behavior of K ( x 1 n ). If the measurements

x i are independent and identically distributed (IID), then K ( x 1 n )/| x | 1 almost

surely; IID sequences are incompressibl e . If x is a realization of a stationary

(but not necessarily IID) random process X , then (177,10)

()

lim

(

)

[55]

the entropy rate (§7) of X . Thus, random data has high complexity, and the

complexity of a random process grows at a rate that just measures its unpredict-

ability.

Search WWH ::

Custom Search

Home