Biology Reference
In-Depth Information
sequence x . We will comment on the straightforward generalization to multiple
sequences x 1
x m afterwards.
The Baum-Welch algorithm generates a sequence of approximations
x 2
,
,...,
θ 0 1 2 ,...
for the set of HMMparameters, with each new set of parameters
θ
improving the value
of P
over the previous iteration. The process terminates when two successive
iterations produce the same values for P
(
x
| θ)
(
x
| θ)
or values that are closer than any
previously chosen tolerance value.
To initialize the Baum-Welch algorithm, choose any set of model parameters,
incorporating any prior information that may be available. Absent such information,
a uniform or any other arbitrary distribution may be chosen. Denote the set of those
initial parameters by
θ 0 . To obtain improved estimates, we still want to use Eq. ( 9.8 )
but this time the counts A jk and E k (
will be replaced with the expected counts
computed from the data. To compute those expected counts, we will first compute
P
b
)
—the probability that the hidden process will transition
from state j to state k at time t .
The inclusion of the parameter set
t
=
j
t + 1
=
k
|
x
0 )
θ 0 in the notation indicates that these proba-
bilities will be computed based on the initial set of parameters
θ 0 . We will omit this
from the notation from now on for simplicity but all of the expressions below assume
that we use this parameter distribution for the computations. Using conditional
probabilities, we obtain
P
t
=
j
t + 1 =
k
|
x
)
P ( x 1 , x 2 ,..., x t , π t
=
j ) P t + 1 = k , x t + 1 , x t + 2 ,..., x l | x 1 , x 2 ,..., x t , π t
=
j )
=
P ( x )
f j ( t ) P t + 1 = k , x t + 1 , x t + 2 ,..., x l | π t
=
j )
(9.9)
=
,
P ( x )
the last equation following from theMarkov property. 11 We have also used the notation
f j (
t
) =
P
(
x 1 ,
x 2 ,...,
x t , π t =
j
)
, introduced earlier for the forward algorithm.
The probability P
t + 1
=
k
,
x t + 1 ,
x t + 2 ,...,
x l | π t
=
j
)
can further be expre-
ssed as
P
t + 1 =
k
,
x t + 1 ,
x t + 2 ,...,
x l | π t =
j
)
=
P
t + 1 =
k
| π t =
j
)
P
(
x t + 1 | π t + 1 =
k
)
P
(
x t + 2 ,...,
x l | π t + 1 =
k
)
),
where, as in the backward algorithm, we use b k (
=
a jk e k (
x t + 1 )
b k (
t
+
1
.
The justification is as follows: Once the process is in state j at time t , for the event
π t + 1 =
t
+
1
) =
P
(
x t + 2 ,...,
x l | π t + 1 =
k
)
x l to take place, the process needs to transition into k ,emit
the symbol x t + 1 , and generate the rest of the sequence x t + 1 ,...,
k
,
x t + 1 ,
x t + 2 ,...,
x l . Combining this
with Eq. ( 9.9 ), we obtain
f j (
t
)
a jk e k (
x t + 1 )
b k (
t
+
1
)
P
t =
j
t + 1 =
k
|
x
) =
.
(9.10)
P
(
x
)
11 The likelihood of the data, given the set of model parameters
θ,
P
(
x
) =
P
(
x
| θ)
, is computed by the
forward algorithm.
 
Search WWH ::




Custom Search