Biomedical Engineering Reference
In-Depth Information
Grassberger did not provide a procedure for finding the maximally predic-
tive models, nor for minimizing the information required among them. He did,
however, make the following observation. A basic result of information theory,
called the data-processing inequality , says that I [ A ; B ] I [ f ( A ); B ], for any vari-
ables A and B —we cannot get more information out of data by processing it than
was in there to begin with. Since the state of the predictor is a function of the
past, it follows that I [ X - ; X + ] I [ f ( X - ); X + ]. Presumably, for optimal predictors, the
two informations are equal—the predictor's state is just as informative as the
original data. (Otherwise, the model would be missing some potential predictive
power.) But another basic inequality is that H [ A ] I [ A ; B ]—no variable contains
more information about another than it does about itself. So, for optimal models,
H [ f ( X - )] I [ X - ; X + ]. The latter quantity, which Grassberger called the effective
measure complexity , can be estimated purely from data, without intervening
models. This quantity—the mutual information between the past and the fu-
ture—has been rediscovered many times, in many contexts, and called excess
entropy (in statistical mechanics), stored information (189), complexity (190-
192), or predictive information (193); the last name is perhaps the clearest.
Since it quantifies the degree of statistical dependence between the past and the
future, it is clearly appealing as a measure of complexity.
Grassberger-Crutchfield-Young Statistical Complexity . The forecasting
complexity notion was made fully operational by Crutchfield and Young
(102,194), who provided an effective procedure for finding the minimal maxi-
mally predictive model and its states. They began by defining the causal states
of a process, as follows. For each history x - , there is some conditional distribu-
tion of future observations, Pr( X + | x - ). Two histories x 1 - and x 2 - are equivalent if
Pr( X + | x 1 - ) = Pr( X + | x 2 - ). Write the set of all histories equivalent to x - as [ x - ]. We
now have a function F that maps each history into its equivalence class: F( x - ) =
[ x - ]. Clearly, Pr( X + |F( x - )) = Pr( X + | x - ). Crutchfield and Young accordingly pro-
posed to forget the particular history and retain only its equivalence class, which
they claimed would involve no loss of predictive power; this was later proved to
be correct (195, theorem 1). They called the equivalence classes the "causal
states" of the process, and claimed which these were the simplest states with
maximal predictive power; this is also was right (195, theorem 2). Finally, one
can show that the causal states are the unique optimal states (195, theorem 3);
any other optimal predictor is really a disguised version of the causal states. Ac-
cordingly, they defined the statistical complexity of a process C as the informa-
tion content of its causal states.
Because the causal states are purely an objective property of the process
being considered, C is too; it does not depend at all on our modeling or means of
description. It is equal to the length of the shortest description of the past that is
relevant to the actual dynamics of the system. As we argued should be the case
above, for IID sequences it is exactly 0, and for periodic sequences it is log p .
Search WWH ::




Custom Search