Heuristic Alignment Methods - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

was proposed by Durbin et al. [ 16 ] based on a generative statistical

model called pair hidden Markov model (pair HMM). A pair HMM

consists of three states, i.e., match, insert, and delete states. Match

state emits a pair of residues and other states emit single reside at a

certain probability, while transitions between states also occur

probabilistically. Roughly speaking, the emission probability from

a match state is proportional to z i;j ¼

ð =T in the partition

function model, the transition probability between match and

insert/delete states to e v=T , and the duration probability of an

insert or delete state to e u=T . However, these probabilities are not

usually given from outside but learnt from a number of datasets of

homologous sequences through the expectation-maximization

algorithm [ 17 ].

If a pair HMM is decoded by the Viterbi algorithm [ 17 ]to

produce the maximal likelihood path, i.e., to generate the align-

ment with the maximal probability, the resultant alignment is

essentially the same as that obtained by the score-based algorithm

mentioned above. To maximally utilize the advantage of pair HMM

or other probabilistic alignment method, we may prefer another

type of decoding known as “maximal expected accuracy” (MEA)

[ 18 ]. To performMEA decoding, we first calculate p i , j according to

Eq. 3 or related one for every pair of i

e Sa i ;b j

, and

then run another dynamic programming routine like Eq. 1 without

gap penalties to obtain the path (alignment) that maximizes the

sum of p i , j on the path ( see Note 4 ). The MEA decoding or its

variants are widely used in recent MSA programs as an important

components of their architectures [ 19 , 20 ].

;

and j

;

The term “consistency” among pairwise alignments is used in

the literature to refer to two related but different concepts. In this

subsection, consistency in the narrower sense is discussed, whereas

consistency in the broader sense will be discussed later under

Subheading 2.5 .

Consider three sequences a , b , and c , and pairwise alignments

2.2 Consistency

and Consistency

Transformation

( c , a ) between them. In the narrower sense,

the triplet ( a i , b j , c k ) are said mutually consistent if a i ~ b j , b j ~ c k ,

and c k ~ a i are present in the given pairwise alignments

( a , b ),

( b , c ), and

( a , b ),

( c , a ), respectively. Strictly speaking according to the

original definition [ 21 ], each element, a i , b j ,or c k , is not a residue

but a node of the bipartite graph that represents the pairwise

alignment (Fig. 1c ), and a consistently aligned region is sandwiched

by adjacent edges that connect such nodes (shadowed area in

Fig. 1e ). Alternatively, each residue may be assigned to the node

of the alignment graph (Fig. 1b ). An edge that connects aligned

residues, e.g., a i ~ b j , is called “trace” [ 22 ]. The triplet of residues

( a i , b j , c k ) and the corresponding traces may be defined as consistent

in the same way as that defined above. As a consistently aligned

region may include not only traces but also indels, the original

( b , c ), and

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home