METHODS AND TECHNIQUES OF COMPLEX SYSTEMS SCIENCE: AN OVERVIEW - Complex Systems Science in Biomedicine

Biomedical Engineering Reference

In-Depth Information

is the relation of "closeness" among trajectories—two trajectories that are close

in the state space will be close in the embedding space, and vice versa. This

leads to a popular and robust scheme for nonlinear prediction, the method of

analogs : when one wants to predict the next step of the series, take the current

point in the embedding space, find a similar one with a known successor, and

predict that the current point will do the analogous thing. Many refinements are

possible, such as taking a weighted average of nearest neighbors, or selecting an

analog at random, with a probability decreasing rapidly with distance. Alter-

nately, one can simply fit non-parametric predictors on the embedding space.

(See (69) for a review.) Closely related is the idea of noise reduction , using the

structure of the embedding-space to filter out some of the effects of measure-

ment noise. This can work even when the statistical character of the noise is

unknown (see (69) again).

Determining the number of lags, and the lag itself, is a problem of model

selection, just as in §2, and can be approached in that spirit. An obvious ap-

proach is to minimize the in-sample forecasting error, as with ARMA models;

recent work along these lines (70,71) uses the minimum description length prin-

ciple (described in §8.3.1 below) to control over-fitting. A more common proce-

dure for determining the embedding dimension, however, is the false nearest

neighbor method (72). The idea is that if the current embedding dimension k is

sufficient to resolve the dynamics, k + 1 would be too, and the reconstructed

state space will not change very much. In particular, points which were close

together in the dimension- k embedding should remain close in the dimension- k

+ 1 embedding. Conversely, if the embedding dimension is too small, points that

are really far apart will be brought artificially close together (just as projecting a

sphere on to a disk brings together points on the opposite side of a sphere). The

particular algorithm of Kennel et al. (72), which has proved very practical, is to

take each point in the k -dimensional embedding, find its nearest neighbor in that

embedding, and then calculate the distance between them. One then calculates

how much further apart they would be if one used a k +1-dimensional embed-

ding. If this extra distance is more than a certain fixed multiple of the original

distance, they are said to be "false nearest neighbors." (Ratios of 2 to 15 are

common, but the precise value does not seem to matter very much.) One then

repeats the process at dimension k + 1, stopping when the proportion of false

nearest neighbors becomes zero, or at any rate sufficiently small. Here, the loss

function used to guide model selection is the number of false nearest neighbors,

and the standard prescriptions amount to empirical risk minimization. One rea-

son simple ERM works well here is that the problem is intrinsically finite-

dimensional (via the Takens result).

Unfortunately, the data required for calculations of quantities like dimen-

sions and exponents to be reliable can be quite voluminous. Approximately

10 2+0.4 D data-points are necessary to adequately reconstruct an attractor of dimen-

sion D (73, pp. 317-319). (Even this is more optimistic than the widely quoted,

Complex Systems Science in Biomedicine

Search WWH ::

Custom Search

Home