Identifying CpG Islands: Sliding Window and Hidden Markov Model Approaches - Mathematical Concepts and Methods in Modern Biology

Biology Reference

In-Depth Information

9.4.3 Training: The Baum-Welch Algorithm

For all computations until now, we always assumed the parameters of the HMM

are known. Those parameters, however, are usually only initial estimates and, in

most cases, we may not even have those estimates to begin with. In this section

we will discuss how to estimate the HMM parameters from the data. This process

is called training (or sometimes learning ). The algorithm presented here was first

introduced in [ 31 ] (see also [ 32 ]). We begin with an observed sequence x or a set

of observed sequences x 1

x m for which we would want to adjust the HMM

model parameters to ensure the best possible fit. Once a set of parameters is determined

for those sequences, we apply the Viterbi algorithm or use the posterior probabilities

to other similar sequences for decoding. The sequences x 1

x 2

,...,

x m are called

training sequences and we say that we will train the model to best fit those sequences.

More formally, this means that given the data x 1

x 2

,...,

x m and the HMM, we

need to find the values of the model parameters (the transition probabilities a jk and the

emission probabilities e k (

x 2

,...,

)

for j

∈

Q and b

∈

)

that maximize the probability

of the data P

to refer to the whole set of parameters for the model, in

this section we will sometimes write P

(

)

.Ifweuse

(

) =

(

| θ) =

(

| θ,

HMM

)

to emphasize

that the likelihood of the data depends on the set of parameters

and on the model.

The goal is to find

θ ∗ =

argmax

{

(

| θ) } .

If the training sequences are annotated and we know the exact paths of the process

emitting the training sequences, the estimates for the model parameters will be com-

puted by determining the frequencies of each transition and emission. Assume that

A jk is the number of transitions from j to k in the set of training sequences and E k (

)

the number of times the state k

∈

Q emitted the symbol b

∈

M . Then the frequencies

A jk

r ∈ Q A jr

)

c ∈ M E k (

E k (

a jk =

and e k (

) =

) ,

for j

∈

(9.8)

are maximum likelihood estimates for the HMM [ 23 ].

When the paths that generate the training data are unknown, we can no longer

determine the counts A jk and E k (

but they can be replaced with the expected

counts for each transition/emission in the HMM. Obtaining the maximum likelihood

estimates in this case can be done by an iterative process known as the Baum-Welch

algorithm [ 31 ]. This method is no longer guaranteed to find a global maximum for the

probability of the data but the iterative steps will converge to a local maximum. The

Baum-Welch algorithm is a special case of a more general class of methods known

as Expectation Maximization algorithms or EM algorithms. Below we will outline

the computational aspect of the Baum-Welch method but will omit the theoretical

proofs of convergence to a maximum. Those proofs together with the general theory

of the EM algorithms can be found in [ 33 ]. In the description of the Baum-Welch

algorithm below we will assume that the HMM is being trained on a single data

)

Mathematical Concepts and Methods in Modern Biology

Search WWH ::

Custom Search

Home