METHODS AND TECHNIQUES OF COMPLEX SYSTEMS SCIENCE: AN OVERVIEW - Complex Systems Science in Biomedicine

Biomedical Engineering Reference

In-Depth Information

Grassberger did not provide a procedure for finding the maximally predic-

tive models, nor for minimizing the information required among them. He did,

however, make the following observation. A basic result of information theory,

called the data-processing inequality , says that I [ A ; B ] I [ f ( A ); B ], for any vari-

ables A and B —we cannot get more information out of data by processing it than

was in there to begin with. Since the state of the predictor is a function of the

past, it follows that I [ X - ; X + ] I [ f ( X - ); X + ]. Presumably, for optimal predictors, the

two informations are equal—the predictor's state is just as informative as the

original data. (Otherwise, the model would be missing some potential predictive

power.) But another basic inequality is that H [ A ] I [ A ; B ]—no variable contains

more information about another than it does about itself. So, for optimal models,

H [ f ( X - )] I [ X - ; X + ]. The latter quantity, which Grassberger called the effective

measure complexity , can be estimated purely from data, without intervening

models. This quantity—the mutual information between the past and the fu-

ture—has been rediscovered many times, in many contexts, and called excess

entropy (in statistical mechanics), stored information (189), complexity (190-

192), or predictive information (193); the last name is perhaps the clearest.

Since it quantifies the degree of statistical dependence between the past and the

future, it is clearly appealing as a measure of complexity.

Grassberger-Crutchfield-Young Statistical Complexity . The forecasting

complexity notion was made fully operational by Crutchfield and Young

(102,194), who provided an effective procedure for finding the minimal maxi-

mally predictive model and its states. They began by defining the causal states

of a process, as follows. For each history x - , there is some conditional distribu-

tion of future observations, Pr( X + | x - ). Two histories x 1 - and x 2 - are equivalent if

Pr( X + | x 1 - ) = Pr( X + | x 2 - ). Write the set of all histories equivalent to x - as [ x - ]. We

now have a function F that maps each history into its equivalence class: F( x - ) =

[ x - ]. Clearly, Pr( X + |F( x - )) = Pr( X + | x - ). Crutchfield and Young accordingly pro-

posed to forget the particular history and retain only its equivalence class, which

they claimed would involve no loss of predictive power; this was later proved to

be correct (195, theorem 1). They called the equivalence classes the "causal

states" of the process, and claimed which these were the simplest states with

maximal predictive power; this is also was right (195, theorem 2). Finally, one

can show that the causal states are the unique optimal states (195, theorem 3);

any other optimal predictor is really a disguised version of the causal states. Ac-

cordingly, they defined the statistical complexity of a process C as the informa-

tion content of its causal states.

Because the causal states are purely an objective property of the process

being considered, C is too; it does not depend at all on our modeling or means of

description. It is equal to the length of the shortest description of the past that is

relevant to the actual dynamics of the system. As we argued should be the case

above, for IID sequences it is exactly 0, and for periodic sequences it is log p .

Search WWH ::

Custom Search

Home