Biomedical Engineering Reference
In-Depth Information
new protein sequence belongs to the modeled family, even with low sequence identity
[11, 12]. A protein sequence can be aligned to an HMM to determine the probability
if it belongs to the modeled family. This alignment can be computed by another by
dynamic programming based alignment algorithm: the Viterbi algorithm.
The structure of an HMM to model a protein sequence family is called a profile
HMM (Fig. 7.5). It consists of a linear sequence of nodes. Each node has a match
( M ), insert ( I ), and delete state ( D ). Between the nodes are transitions with associ-
ated probabilities. Each match state and insert state also contains a position-specific
table with probabilities for emitting a particular amino acid. Both transition and emis-
sion probabilities can be generated from a multiple sequence alignment of a protein
family [11].
An HMM can be compared (aligned) with a given sequence to determine the proba-
bility that the sequence belongs to the modeled family. The most probable path through
the HMM that generates a sequence equal to the given sequence determines the
similarity score. The well-known Viterbi algorithm computes this score by dynamic
programming. The computation is given by the following recurrence relations.
M(i , j)
=
e(M j , s i )
+
max
{
M(i
1, j
1 )
+
t(M j 1 , M j ) , I(i
1, j
1 )
+
+
}
t(I j 1 , M j ) , D(i
1, j
1 )
t(D j 1 , M j )
=
+
{
+
+
I(i , j)
e(I j , s i )
max
M(i
1, j)
t(M j , I j ) , I(i
1, j)
t(I j , I j ) , D(i
1, j)
+
t(D j , I j )
}
D(i , j)
=
max
{
M(i , j
1 )
+
t(M j 1 , D j ) , I(i , j
1 )
+
t(I j 1 , D j ) , D(i , j
1 )
+
t(D j 1 , D j )
}
where tr (state1, state2) is the transition cost from state1 to state2 and e(M j , s i ) is
the emission cost of amino acid s i at state M j . M(i , j) denotes the score of the best
path matching sub-sequence s 1 , ... , s i to the submodel up to state j , ending with s i
being emitted by state M j . Similarly I(i , j) is the score of the best path ending in s i
being emitted by I j , and, D(i , j) for the best path ending in state D j . Initialization
D 1
D 2
D 3
D 4
I 0
I 1
I 2
I 3
I 4
M 0
(Start)
M 5
(End)
M 1
M 2
M 3
M 4
Figure 7.5 The transition structure of a profile HMM of length 4. Squares represent match
states, circles represent delete states, and diamonds represent insertions.
Search WWH ::




Custom Search