Biomedical Engineering Reference
In-Depth Information
formation functions (§7), and so forth—see (10) for details. Moreover, it is often
easier to determine these properties from a system's grammar than from direct
examination of sequence statistics, especially since specialized techniques are
available for grammatical inference (92,93).
3.6.1.
Hidden Markov Models
The most important special case of this general picture is that of regular
languages. These, we said, are generated by machines with only a finite mem-
ory. More exactly, there is a finite set of states x , with two properties:
1. The distribution of y t depends solely on x t , and
2. The distribution of x t +1 depends solely on x t .
That is, the x sequence is a Markov chain, and the observed y sequence is a
noisy function of that chain. Such models are very familiar in signal processing
(94), bioinformatics (95), and elsewhere, under the name of hidden Markov
models (HMMs). They can be thought of as a generalization of ordinary
Markov chains to the state-space picture described in §3.1. HMMs are particu-
larly useful in filtering applications, since very efficient algorithms exist for
determining the most probable values of x from the observed sequence y . The
expectation-maximization (EM) algorithm (96) even allows us to simultane-
ously infer the most probable hidden states and the most probable parameters for
the model.
3.6.2.
Variable-Length Markov Models
The main limitation of ordinary HMMs methods, even the EM algorithm, is
that they assume a fixed architecture for the states, and a fixed relationship
between the states and the observations. That is to say, they are not geared to-
wards inferring the structure of the model. One could apply the model-selection
techniques of §2, but methods of direct inference have also been developed. A
popular one relies on variable-length Markov models , also called context
trees or probabilistic suffix trees (97-100).
A suffix here is the string at the end of the y time series at a given time, so,
for example, the binary series abbabbabb has suffixes b , bb , abb , babb , etc., but
not bab . A suffix is a context if the future of the series is independent of its past,
given the suffix. Context-tree algorithms try to identify contexts by iteratively
considering longer and longer suffixes, until they find one that seems to be a
context. For instance, in a binary series, such an algorithm would first try
Search WWH ::




Custom Search