Biology Reference
In-Depth Information
was proposed by Durbin et al. [ 16 ] based on a generative statistical
model called pair hidden Markov model (pair HMM). A pair HMM
consists of three states, i.e., match, insert, and delete states. Match
state emits a pair of residues and other states emit single reside at a
certain probability, while transitions between states also occur
probabilistically. Roughly speaking, the emission probability from
a match state is proportional to z i;j ¼
ð =T in the partition
function model, the transition probability between match and
insert/delete states to e v=T , and the duration probability of an
insert or delete state to e u=T . However, these probabilities are not
usually given from outside but learnt from a number of datasets of
homologous sequences through the expectation-maximization
algorithm [ 17 ].
If a pair HMM is decoded by the Viterbi algorithm [ 17 ]to
produce the maximal likelihood path, i.e., to generate the align-
ment with the maximal probability, the resultant alignment is
essentially the same as that obtained by the score-based algorithm
mentioned above. To maximally utilize the advantage of pair HMM
or other probabilistic alignment method, we may prefer another
type of decoding known as “maximal expected accuracy” (MEA)
[ 18 ]. To performMEA decoding, we first calculate p i , j according to
Eq. 3 or related one for every pair of i
e Sa i ;b j
, and
then run another dynamic programming routine like Eq. 1 without
gap penalties to obtain the path (alignment) that maximizes the
sum of p i , j on the path ( see Note 4 ). The MEA decoding or its
variants are widely used in recent MSA programs as an important
components of their architectures [ 19 , 20 ].
2
½
1
;
m
and j
2
½
1
;
n
The term “consistency” among pairwise alignments is used in
the literature to refer to two related but different concepts. In this
subsection, consistency in the narrower sense is discussed, whereas
consistency in the broader sense will be discussed later under
Subheading 2.5 .
Consider three sequences a , b , and c , and pairwise alignments
2.2 Consistency
and Consistency
Transformation
( c , a ) between them. In the narrower sense,
the triplet ( a i , b j , c k ) are said mutually consistent if a i ~ b j , b j ~ c k ,
and c k ~ a i are present in the given pairwise alignments
( a , b ),
( b , c ), and
A
A
A
A
( a , b ),
( c , a ), respectively. Strictly speaking according to the
original definition [ 21 ], each element, a i , b j ,or c k , is not a residue
but a node of the bipartite graph that represents the pairwise
alignment (Fig. 1c ), and a consistently aligned region is sandwiched
by adjacent edges that connect such nodes (shadowed area in
Fig. 1e ). Alternatively, each residue may be assigned to the node
of the alignment graph (Fig. 1b ). An edge that connects aligned
residues, e.g., a i ~ b j , is called “trace” [ 22 ]. The triplet of residues
( a i , b j , c k ) and the corresponding traces may be defined as consistent
in the same way as that defined above. As a consistently aligned
region may include not only traces but also indels, the original
( b , c ), and
A
A
Search WWH ::




Custom Search