Identifying CpG Islands: Sliding Window and Hidden Markov Model Approaches - Mathematical Concepts and Methods in Modern Biology

Biology Reference

In-Depth Information

Table 9.3 The parameter set for the HMM from Example 9.3 in tabular form.

Transitions

Emissions

Initial Distribution

0.95

0.05

0.67

0.33

0.5

0.1

0.9

0.40

0.60

0.5

Table 9.4 A set of probabilities (parameters) of a HMM for a DNA sequence

where the model is only concerned with the frequencies of the individual

nucleotides. The transition matrix of the HMM is under the “Transitions” head-

ing. Each of the hidden states emits a symbol from the set M

}

with emission probabilities listed under the “Emissions” heading. The hidden

process is equally likely to begin in the “+” and “

A , C , T , G

−

” state, as stated under the

“Initial Distribution” heading.

Transitions

Emissions

Initial Distribution

−

0.90

0.10

0.15

0.33

0.16

0.36

0.5

−

0.05

0.95

0.27

0.24

0.26

0.23

0.5

be used to construct a HMM. When we look only at the nucleotide frequencies as in

Table 9.2 we can consider a HMMwith a state space Q

={+ , −}

, where each of these

states can emit a symbol from the set M

with emission probabilities

as those in Table 9.2 . Assuming that hidden process transitions between the “

}

” and

−

“

” states are as in Figure 9.5 (where in this case, we will identify the state U with

−

“

”) the parameters for the HMMwill be those in Table 9.4 .

If we want the model to incorporate information about dinucleotides, as in the

case of Table 9.1 , the set of emitted symbols is again M

” and the state F with “

but now the

emission events at each step are not independent from one another. If, say, the process

is in the hidden state “

}

,” the probability for emitting a symbol C will depend upon the

symbol emitted by the previous state and whether this symbol was emitted from the

“

” or from the “

−

” hidden state. We can think of it as emitted fromone of two hidden

states C

or C

. Thus, for each of the emission symbols k

∈

M we should have states

−

and k

in Q , leading to a state space Q

+ ,

− ,

+ ,

− ,

+ ,

− ,

+ ,

− }

−

for the hidden process. The matrix for the transitions within the subsets of the “

”

and “

” states should be close to those in the transition matrices in Table 9.5 but

switching between the “

−

” and “

−

” subsets Q

+ ={

+ ,

+ }

and Q

− =

{

of Q should also be allowed with some small probability. Table 9.5

presents this scenario.

Exercise 9.5. The HMM from Table 9.4 could be considered to be a special case

of the general model from Table 9.5 with state space Q

− ,

− }

A + ,

A − ,

C + ,

C − ,

T + ,

T − ,

G + ,

G − }

. Give a set of HMM parameters for the general HMM from

Mathematical Concepts and Methods in Modern Biology

Search WWH ::

Custom Search

Home