Biology Reference
In-Depth Information
alignment under the sum-of-pairs scoring system. Gotoh [ 61 ] also
succeeded in acceleration of the algorithm when N is large based on
the “generalized profile” data structure, where “generalized”means
that various lengths of internal gaps, as well as ordinary residues,
are treated in the form of profile. The difficult point is to compute
G ( p , q , l ) exactly. In the worst case, this problem is proven to be
NP-complete [ 62 ], but in practice not much more than O ( NL 2 )
operations are sufficient to reach the optimality [ 63 , 64 ]. However,
this type of exact method is still considerably slower than approxi-
mate methods that use only the ordinary profile vectors. Except for
PRRN/PRRP [ 52 , 63 ] and OPAL [ 43 ], currently popular MSA
programs, such as ClustalW [ 36 ], MAFFT [ 40 , 65 ] and MUSCLE
[ 53 ], adopt a fast approximate method to evaluate gap open penal-
ties. An intermediate approach that calculates G ( p , q , l ) exactly but
omits the branch-and-bound extension is employed in PRIME
[ 66 ] to introduce double affine gap penalties in MSA. T-Coffee
[ 24 ], ProbCons [ 19 ], and their derivatives are not bothered with
the procedure to evaluate gap penalties in MSA, as gaps are not
penalized at the progressive and refinement stages.
Note that many column scores other than the sum-of-pairs
column score (the first term in the square brackets in Eq. 4 ) are
compatible with progressive alignment and iterative refinement.
In fact, MUSCLE [ 41 ] and HHsearch [ 67 ] (a component of
Clustal
) use different implementations of column-specific log
odds score. Two studies [ 68 , 69 ] on relative performance of various
column scores indicated that most of the scores they examined
behave similarly to one another in alignment accuracy. However,
further studies might be necessary on this point [ 70 ]. Meanwhile,
Edgar [ 71 ] recently noticed that VTML amino acid substitution
matrix [ 72 ] works best for accurate MSA construction by MUS-
CLE under the optimized gap penalties.
At the step (2), the existingMSA is horizontally divided into two
groups. As there are 2 N 1
Ω
1 possible ways for division, trials of all
of them [ 57 , 58 ] are time consuming. Hirosawa et al. [ 73 ] examined
variousmethods of partitioning, and reached the conclusion that tree-
restricted partitioning (partitioning at each edge of the guide tree) is
best in balance between accuracy and computational costs. Gotoh
[ 74 ] also reached the same partitioning scheme fromanother point of
view, i.e., this partitioning ideally fit calculation of WSP with the pair
weights w p , q obtained by the three-way method, an approximation of
the Rational II model of Altschul [ 60 ]. Weighting sequences or
sequence pairs will be discussed in the last subsection in more detail.
The tree-restricted partitioning is the default iterative refinement
strategy of PRRN [ 52 ], MAFFT [ 65 ] and MUSCLE [ 53 ].
Fast homology search programs such as BLAST [ 75 ] and BLAT
[ 76 ] generally adopts seed and extension strategies to find high-
scoring segment pairs (HSPs). The concept of HSP is easily
2.5 Anchoring
Search WWH ::




Custom Search