Identifying CpG Islands: Sliding Window and Hidden Markov Model Approaches - Mathematical Concepts and Methods in Modern Biology

Biology Reference

In-Depth Information

where the simulations have been generated by using the fair die). Are there any

emerging patterns? Be mindful that due to the stochastic nature of the process, you

should generate several sequences for each set of parameters. Try to summarize your

observations, by answering the following questions:

1. Did you detect any difference in the performance of the Viterbi algorithm for the

sets of parameters from part (a) and (b)?

2. If you answered yes in (1), is the performance of the algorithm to decode the

hidden sequence better for distributions like those in part a) or for distributions

like those in part (b)?

3. Does the performance of the Viterbi algorithm improve when the values of and

are very close to 1?

4. What explanations can you provide for your findings for questions (1)? It may

be helpful to revisit Exercises 9.1 and 9.2.

Exercise 9.9. ∗ Repeat Exercise 9.8 , this time experimenting with the CpG Islands

application in the CpG Educate suite. Consider distributions such as in Table 9.5 and

a) values for p and q close to 0.5; b) values for p and q close to 1. Experiment with

both short and long sequences and notice that when p or q are very close to 1, the

hidden process and/or the maximal probability path may never switch between the

“

” states for relatively short sequence. Use the file Table_9.5.xlsx available

from the volume's website to generate sets of HMM parameters for different values of

p and q . Be sure to save the generated sets of HMM parameters in Comma Separated

Values (CSV) format and use the CSV fileto load the parameters into the CpG Islands

application (refer to the CpG Educate tutorial [ 30 ] for more details).

” and “

−

9.4.2 Evaluation and Posterior Decoding:

The Forward-Backward Algorithm

In this section we outline a computationally efficient algorithm for computing P

(

)

the probability for an observed sequence x

x 1 x 2 ···

x l . We also discuss an alterna-

tive decoding method called posterior decoding.

The forward algorithm is similar in idea to the Viterbi algorithm. Instead of keep-

ing track of the hidden sequences of maximal probability in state j at time t ,the

forward algorithm computes the probabilities of being in state j , having observed

the first t symbols x 1 x 2 ···

x t of the sequence x . Denote these probabilities by

f j (

) =

(

x 1 x 2 ···

x t ,π t

∈

Q . Recursively, as in the Viterbi algorithm,

we can now compute

f k (

) =

(

x 1 x 2 ···

x t + 1 ,π t + 1 =

the probability that the process is in state k at time t

1 with emitted sequence

x 1 x 2 ···

x t + 1 :

f k (

) =

(

x 1 x 2 ···

x t + 1 ,π t + 1 =

) =

f j (

)

a jk e k (

x t + 1 ),

∈

Mathematical Concepts and Methods in Modern Biology

Search WWH ::

Custom Search

Home