Biomedical Engineering Reference
In-Depth Information
3.6. Symbolic or Categorical Time Series
The methods we have considered so far are intended for time series taking
continuous values. An alternative is to break the range of the time series into
discrete categories (generally only finitely many of them); these categories are
sometimes called symbols , and the study of these time series symbolic dynam-
ics . Modeling and prediction then reduces to a (perhaps more tractable) problem
in discrete probability, and many methods can be used that are simply inapplica-
ble to continuous-valued series (10). Of course, if a bad discretization is chosen,
the results of such methods are pretty well meaningless, but sometimes one gets
data that are already nicely discrete—human languages, the sequences of bio-
polymers, neuronal spike trains, etc. We shall return to the issue of discretization
below, but for the moment we will simply consider the applicable methods for
discrete-valued, discrete-time series, however obtained.
Formally, we take a continuous variable z and partition its range into a
number of discrete cells , each labeled by a different symbol from some alpha-
bet ; the partition gives us a discrete variable y = G( z ). A word or string is just a
sequence of symbols, y 0 y 1 ... y n . A time series z 0 n naturally generates a string
G( z 0 n ) G( z 0 ) G( z 1 ) ... G( z n ). In general, not every possible string can actually be
generated by the dynamics of the system we're considering. The set of allowed
sequences is called the language . A sequence that is never generated is said to
be forbidden . In a slightly inconsistent metaphor, the rules that specify the al-
lowed words of a language are called its grammar . To each grammar there cor-
responds an abstract machine or automaton that can determine whether a given
word belongs to the language, or, equivalently, generate all and only the allowed
words of the language. The generative versions of these automata are stochastic,
i.e., they generate different words with different probabilities, matching the sta-
tistics of G( z ).
By imposing restrictions on the forms the grammatical rules can take, or,
equivalently, on the memory available to the automaton, we can divide all lan-
guages into four nested classes, a hierarchical classification due to Chomsky
(91). At the bottom are the members of the weakest, most restricted class, the
regular languages generated by automata within only a fixed, finite memory for
past symbols ( finite state machines ). Above them are the context free lan-
guages, whose grammars do not depend on context; the corresponding machines
are stack automata , which can store an unlimited number of symbols in their
memory, but on a strictly first-in, first-out basis. Then come the context-
sensitive languages; and at the very top, the unrestricted languages, generated
by universal computers. Each stage in the hierarchy can simulate all those be-
neath it.
We may seem to have departed very far from dynamics, but actually this is
not so. Because different languages classes are distinguished by different kinds
of memories, they have very different correlation properties (§3.2), mutual in-
Search WWH ::




Custom Search