METHODS AND TECHNIQUES OF COMPLEX SYSTEMS SCIENCE: AN OVERVIEW - Complex Systems Science in Biomedicine

Biomedical Engineering Reference

In-Depth Information

3.6. Symbolic or Categorical Time Series

The methods we have considered so far are intended for time series taking

continuous values. An alternative is to break the range of the time series into

discrete categories (generally only finitely many of them); these categories are

sometimes called symbols , and the study of these time series symbolic dynam-

ics . Modeling and prediction then reduces to a (perhaps more tractable) problem

in discrete probability, and many methods can be used that are simply inapplica-

ble to continuous-valued series (10). Of course, if a bad discretization is chosen,

the results of such methods are pretty well meaningless, but sometimes one gets

data that are already nicely discrete—human languages, the sequences of bio-

polymers, neuronal spike trains, etc. We shall return to the issue of discretization

below, but for the moment we will simply consider the applicable methods for

discrete-valued, discrete-time series, however obtained.

Formally, we take a continuous variable z and partition its range into a

number of discrete cells , each labeled by a different symbol from some alpha-

bet ; the partition gives us a discrete variable y = G( z ). A word or string is just a

sequence of symbols, y 0 y 1 ... y n . A time series z 0 n naturally generates a string

G( z 0 n ) G( z 0 ) G( z 1 ) ... G( z n ). In general, not every possible string can actually be

generated by the dynamics of the system we're considering. The set of allowed

sequences is called the language . A sequence that is never generated is said to

be forbidden . In a slightly inconsistent metaphor, the rules that specify the al-

lowed words of a language are called its grammar . To each grammar there cor-

responds an abstract machine or automaton that can determine whether a given

word belongs to the language, or, equivalently, generate all and only the allowed

words of the language. The generative versions of these automata are stochastic,

i.e., they generate different words with different probabilities, matching the sta-

tistics of G( z ).

By imposing restrictions on the forms the grammatical rules can take, or,

equivalently, on the memory available to the automaton, we can divide all lan-

guages into four nested classes, a hierarchical classification due to Chomsky

(91). At the bottom are the members of the weakest, most restricted class, the

regular languages generated by automata within only a fixed, finite memory for

past symbols ( finite state machines ). Above them are the context free lan-

guages, whose grammars do not depend on context; the corresponding machines

are stack automata , which can store an unlimited number of symbols in their

memory, but on a strictly first-in, first-out basis. Then come the context-

sensitive languages; and at the very top, the unrestricted languages, generated

by universal computers. Each stage in the hierarchy can simulate all those be-

neath it.

We may seem to have departed very far from dynamics, but actually this is

not so. Because different languages classes are distinguished by different kinds

of memories, they have very different correlation properties (§3.2), mutual in-

Search WWH ::

Custom Search

Home