Biology Reference
In-Depth Information
Table 9.2 Sample nucleotide frequencies from [ 23 ]. The entries are computed
by averaging the columns of Table 9.1 for the (“+”) and for the (“
”) regions.
A
C
T
G
Island (“+”) Frequencies:
0.15
0.33
0.16
0.36
Non-Island (“−”) Frequencies:
0.27
0.24
0.26
0.23
to have a way of obtaining some estimates for those from the data. This may initially
seem to be an impossible task. After all, how can we estimate those frequencies when
the boundaries between the regions are unknown? As we will see below, there is a way
to solve this problem, if we view the DNA data as generated by a known underlying
mechanism, formally described by a HMM.
The rest of the chapter is organized as follows: Section 3 contains a brief review of
Markov chains and HHMs, presenting some examples, highlighting some main con-
cepts, and introducing the notation. Readers without prior experience with Markov
chains or HMMs may need to consider a more detailed introduction to these top-
ics such as [ 25 , 26 ]. In Section 4 we present CGI identification methods based on
HMMs. The section focuses on the questions of evaluation, decoding, and parameter
estimation for HMM. Some final comments and notes on generalizations and ongoing
work involving HMMs for CGI identification are gathered in Section 5. A suite of
web applications, CpG Educate , has been developed by the authors to illustrate the
algorithms and is used for many of the examples and exercises in the chapter. CpG
Educate is freely available at http://inspired.jsu.edu/
agarrett/cpg/ .
9.3 DEFINITION AND BASIC PROPERTIES OF MARKOV
CHAINS AND HIDDEN MARKOV MODELS
9.3.1 Finite State Markov Chains
A Markov chain is a “process” that at any particular time is in one of a finite number
of “states” from a set Q
. We will only consider discrete time, which
means that the process can be visualized as running according to a digital timer,
possibly changing states at each time step. The Markov property assumes the state of
the process at the next step depends only on its present state and not on the history
of how the process arrived at that state. Thus, the process has no memory beyond
the “present” and the only factors that govern the process are (i) its current state, and
(ii) the probabilities withwhich the process moves from its current state to other states.
Assume that
={
s 1 ,...,
s n }
,...
As time goes on, the process transitions from one state to another generating a path
π = π 1 π 2 ··· π l (the number l is the length of the path). If we have observed the path
of the process up to time t , the formal definition of the Markov property is that
π t
Q denotes the state of the process at time t , where t
=
1
,
2
,
3
P
{ π t + 1 | π 1 π 2 ··· π t }=
P
{ π t + 1 | π t } ,
t
=
1
,
2
,
3
,...,
 
Search WWH ::




Custom Search