Biology Reference
In-Depth Information
It can be shown that the computational complexity of the Viterbi algorithm is no
worse than O
n 2 l
in time 8 and O
is the number of states
and l is the length of the sequence. Thus, the complexity of the algorithm is quadratic
in time, which, although still computationally demanding, is a huge improvement
over the exponential rate of the brute force approach.
Since the Viterbi algorithm involves multiplying many probabilities, the results
could get very small, resulting in essentially zero probabilities and causing under-
flow problems. To avoid this, all calculations are performed on the logarithms of the
probabilities
(
)
(
nl
)
in space, where n
=|
Q
|
, using addition instead of multiplication.
Exercise 9.6. Show that the hidden path found by the Viterbi algorithm is exactly
the path of maximal probability, given the observed sequence x . That is, show that
v k (
t
),
k
Q
,
t
=
1
,
2
,...
π max =
argmax
π
P
(
x
,π) =
argmax
π
P
|
x
).
Obviously, Example 9.5 shows that performing the Viterbi computations by hand
is tedious even for very short paths. From now on, we will use the CpG Educate suite
that implements the Viterbi algorithm for HMMs corresponding to the Dishonest
Casino problem from Example 9.4 and for the CGI identification problem.
The applications in the CpGEducate suite have the capability to generate an output
from a HMM and record the hidden path that produced the emitted sequence. This
simulated data can then be used to test the “accuracy” of the Viterbi algorithm by
comparing the predicted states from the Viterbi decoding with the actual ones from
the simulations.
Example 9.6. We used the Dishonest Casino application in the CpGEducate suite to
generate a sequence of length 400 for the Dishonest Casino HMM from Example 9.4 .
We then applied the Viterbi algorithm to the emitted sequence and compared the
output with the known hidden sequence from the simulations. Figure 9.7 depicts the
outcome. The gray background identifies rolls made with the unfair die. The outcomes
in bold are the predicted unfair die state from the path of maximal probability found
by the Viterbi algorithm. Although the overlap is not perfect, Viterbi appears to have
recovered the states fairly well. In practice, the Viterbi algorithmwill often miss short
sequences generated by either die. Since switching between the dice happens with
small probabilities, the path of maximal probability would generally tend to “bridge”
short gaps and preserve the trends set by longer runs.
When using a HMM for identifying CGIs, the outcome of the Viterbi algorithm
will produce the predicted path through the hidden “+” and “
” states. The start
of the island will be where the predicted hidden sequence switches from the subset
Q
={
A
,
C
,
T
,
G
}
to the subset Q
+ ={
A
+ ,
C
+ ,
T
+ ,
G
+ }
and the end of the
island will be where the sequence switches back from the “
+
”tothe“
” states.
8 The “big O ” notation here means that for very large values of n and l the number of operations
required by the Viterbi algorithm has the same order of magnitude as the number n 2 l .
 
Search WWH ::




Custom Search