Identifying CpG Islands: Sliding Window and Hidden Markov Model Approaches - Mathematical Concepts and Methods in Modern Biology

Biology Reference

In-Depth Information

where the sum is over all possible hidden sequences

π = π 1 π 2 π 3 ··· π l with

π t ∈

Q and where

(

,π) =

a 0 π 1

π i (

x i )

(9.2)

π i π i + 1

As with the decoding problem, the mathematical solution from Eq. ( 9.1 ) has no

practical value since the number of such paths grows exponentially as the length

of the path l grows.

3. Training (Learning) Problem: Given an observed sequence x or a set of observed

sequences, what are the HMM parameters that make the sequence x most likely

to occur? The answer to this question provides estimates for the parameters of the

HMM from a data set of observed sequences.

9.4.1 Decoding: The Viterbi algorithm

The Viterbi algorithm is a recursive and computationally efficient method for com-

puting the most likely state sequence

π max for a given observed sequence x

x 1 x 2 x 3 ···

x l (see [ 28 ] and also [ 29 ] for an interesting account of the history by

Andrew Viterbi himself).

To understand the recursive step, let's assume that for any state j

∈

Q we have

somehow computed the hidden sequence

π 1 π 2 π 3 ··· π t − 2 π t − 1 of highest probability

among those emitting the observations x 1 x 2 x 3 ···

x t − 1 up to time t

−

1 with

π t − 1 =

j .

That is, we assume that for each j

∈

Q , we have determined the most probable path

of length t

−

1 for the data x 1 x 2 x 3 ···

x t − 1 , ending in state j . Denote the probability

of this path by

v j (

−

)

v j (

−

) =

max

π = π 1 π 2 π 3 ··· π t − 1

(π t − 1 =

x t − 1 ),

∈

Next, we can use

v j (

−

)

to compute the most probable path of length t ending in

each of the states k

∈

Q and emitting the sequence x 1 x 2 x 3 ···

x t :

v k (

) =

max

π = π 1 π 2 π 3 ··· π t

(π t =

x t ).

The probability

of the most likely path of length t ending in k and emitting x t

will be the largest among the probabilities for hidden sequences that get into j at time

v k (

)

−

1 with a maximal probability, transition from j to k at time t , and emit x t . Thus

v k (

) =

Q { v j (

−

)

a jk e k (

x t ) }=

e k (

x t )

Q { v j (

−

)

a jk } .

max

∈

Iterating this argument for all t

,...,

l will allow us to compute the probability

of the most likely path

π = π 1 π 2 π 3 ··· π l for the data x

x 1 x 2 x 3 ···

x l . To be able to

recover the path x

x 1 x 2 x 3 ···

x l itself, for each time t and for each state k

∈

Q ,we

keep pointers (ptr) to remember the state r

∈

Q from which the maximal probability

path ending in k came. For each t

,...,

l and k

∈

Q we record k 's predecessor

Mathematical Concepts and Methods in Modern Biology

Search WWH ::

Custom Search

Home