METHODS AND TECHNIQUES OF COMPLEX SYSTEMS SCIENCE: AN OVERVIEW - Complex Systems Science in Biomedicine

Biomedical Engineering Reference

In-Depth Information

formation functions (§7), and so forth—see (10) for details. Moreover, it is often

easier to determine these properties from a system's grammar than from direct

examination of sequence statistics, especially since specialized techniques are

available for grammatical inference (92,93).

3.6.1.

Hidden Markov Models

The most important special case of this general picture is that of regular

languages. These, we said, are generated by machines with only a finite mem-

ory. More exactly, there is a finite set of states x , with two properties:

1. The distribution of y t depends solely on x t , and

2. The distribution of x t +1 depends solely on x t .

That is, the x sequence is a Markov chain, and the observed y sequence is a

noisy function of that chain. Such models are very familiar in signal processing

(94), bioinformatics (95), and elsewhere, under the name of hidden Markov

models (HMMs). They can be thought of as a generalization of ordinary

Markov chains to the state-space picture described in §3.1. HMMs are particu-

larly useful in filtering applications, since very efficient algorithms exist for

determining the most probable values of x from the observed sequence y . The

expectation-maximization (EM) algorithm (96) even allows us to simultane-

ously infer the most probable hidden states and the most probable parameters for

the model.

3.6.2.

Variable-Length Markov Models

The main limitation of ordinary HMMs methods, even the EM algorithm, is

that they assume a fixed architecture for the states, and a fixed relationship

between the states and the observations. That is to say, they are not geared to-

wards inferring the structure of the model. One could apply the model-selection

techniques of §2, but methods of direct inference have also been developed. A

popular one relies on variable-length Markov models , also called context

trees or probabilistic suffix trees (97-100).

A suffix here is the string at the end of the y time series at a given time, so,

for example, the binary series abbabbabb has suffixes b , bb , abb , babb , etc., but

not bab . A suffix is a context if the future of the series is independent of its past,

given the suffix. Context-tree algorithms try to identify contexts by iteratively

considering longer and longer suffixes, until they find one that seems to be a

context. For instance, in a binary series, such an algorithm would first try

Search WWH ::

Custom Search

Home