Biology Reference
In-Depth Information
rescoring of S ( a i , b j ) may be repeated several times. ProbCons then
uses a progressive method to construct the MSA, optionally fol-
lowed by iterative refinement. Several protein MSA programs that
follow the framework of ProbCons have been developed [ 26 - 28 ],
which are among the most accurate MSA programs currently avail-
able. Moreover, genomic MSA program Pecan [ 29 , 30 ] also uses
basically the same strategy as ProbCons.
The progressive method is not only the first practical MSA con-
struction strategy [ 31 ] but also constitutes the core of a majority of
contemporary MSA programs, besides “pure” progressive pro-
grams such as Aln3nn [ 32 ], Kalign2 [ 33 ], Prank [ 34 ], Clustal
2.3 Progressive
Method
Ω
[ 35 ], and ClustalW [ 36 ]. A progressive method consists of four
steps, (1)-(4) as follows:
(1) Calculate a distance matrix from every pair of the N input
sequences. In a standard way, N ( N
1)/2 pairwise align-
ments are performed to count the numbers of matches, mis-
match, and indels, which are then converted to the distance
measures. This procedure is costly when N is large, as it takes
O ( N 2 L 2 ) computational time. Hence, more economical
alignment-free methods [ 37 ] are often used to obtain the
initial distance matrix. Among various alignment-free meth-
ods, Muth-Manber algorithm [ 38 ] that is tolerant to single
mismatches/indels appears to be most sensitive. While
Kalign2 [ 33 ] adopts this method, most other programs use
simpler methods based on counts of common k -mers [ 39 ],
sometimes in combination with reduced amino-acid alphabets
[ 40 , 41 ].
(2) Construct a guide tree by a hierarchical clustering algorithm.
The UPGMA method [ 42 ] is most widely used. Unexpectedly,
however, several reports suggest that single linkage clustering
method produces consistently better alignment than the
UPGMA method [ 43 , 44 ]. Both UPGMA and single linkage
trees can be constructed in O ( N 2 )[ 41 , 45 ]. The PartTree option
of MAFFT [ 46 ] and the sequence-embedding technique [ 47 ]
adoptedbyClustal
[ 35 ] bypass a large part of step (1), enabling
very rapid construction of a guide tree.
(3) A leaf of the guide tree corresponds to each input sequence,
whereas an internal node corresponds to an MSA that is
constructed by pairwise alignment of sequence(s) or MSA(s)
corresponding to its children. Most MSA programs hold the
MSA corresponding to an internal node in the form of profile
[ 48 ] or “generalized profile” (see the next subsection),
whereas several MSA programs use a directed acyclic graph
(DAG) that can represent a set of alternative alignments rather
than a unique MSA [ 49 - 51 ]. Generally speaking, the DAG-
Ω
Search WWH ::




Custom Search