Heuristic Alignment Methods - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

rescoring of S ( a i , b j ) may be repeated several times. ProbCons then

uses a progressive method to construct the MSA, optionally fol-

lowed by iterative refinement. Several protein MSA programs that

follow the framework of ProbCons have been developed [ 26 - 28 ],

which are among the most accurate MSA programs currently avail-

able. Moreover, genomic MSA program Pecan [ 29 , 30 ] also uses

basically the same strategy as ProbCons.

The progressive method is not only the first practical MSA con-

struction strategy [ 31 ] but also constitutes the core of a majority of

contemporary MSA programs, besides “pure” progressive pro-

grams such as Aln3nn [ 32 ], Kalign2 [ 33 ], Prank [ 34 ], Clustal

2.3 Progressive

Method

Ω

[ 35 ], and ClustalW [ 36 ]. A progressive method consists of four

steps, (1)-(4) as follows:

(1) Calculate a distance matrix from every pair of the N input

sequences. In a standard way, N ( N

1)/2 pairwise align-

ments are performed to count the numbers of matches, mis-

match, and indels, which are then converted to the distance

measures. This procedure is costly when N is large, as it takes

O ( N 2 L 2 ) computational time. Hence, more economical

alignment-free methods [ 37 ] are often used to obtain the

initial distance matrix. Among various alignment-free meth-

ods, Muth-Manber algorithm [ 38 ] that is tolerant to single

mismatches/indels appears to be most sensitive. While

Kalign2 [ 33 ] adopts this method, most other programs use

simpler methods based on counts of common k -mers [ 39 ],

sometimes in combination with reduced amino-acid alphabets

[ 40 , 41 ].

(2) Construct a guide tree by a hierarchical clustering algorithm.

The UPGMA method [ 42 ] is most widely used. Unexpectedly,

however, several reports suggest that single linkage clustering

method produces consistently better alignment than the

UPGMA method [ 43 , 44 ]. Both UPGMA and single linkage

trees can be constructed in O ( N 2 )[ 41 , 45 ]. The PartTree option

of MAFFT [ 46 ] and the sequence-embedding technique [ 47 ]

adoptedbyClustal

[ 35 ] bypass a large part of step (1), enabling

very rapid construction of a guide tree.

(3) A leaf of the guide tree corresponds to each input sequence,

whereas an internal node corresponds to an MSA that is

constructed by pairwise alignment of sequence(s) or MSA(s)

corresponding to its children. Most MSA programs hold the

MSA corresponding to an internal node in the form of profile

[ 48 ] or “generalized profile” (see the next subsection),

whereas several MSA programs use a directed acyclic graph

(DAG) that can represent a set of alternative alignments rather

than a unique MSA [ 49 - 51 ]. Generally speaking, the DAG-

Ω

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home