MAFFT: Iterative Refinement and Additional Methods - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

3 Difference Between MAFFT and MUSCLE

MUSCLE [ 7 , 8 ] is another high-performance MSA program. It

adopted the overall design of the NW-NS-i option of MAFFT ( see

Subheading 2.2 ). Other options corresponding to NW-NS-1 and

NW-NS-2 ( see Subheading 2.1 ) can be selected by specifying the

number of iterations. The accuracies of these options are close to

the corresponding options of MAFFT. However, MUSCLE and

MAFFT have several differences in the scoring system, the

weighting system, and so on. Among these, MUSCLE made a

great contribution to this area by introducing an approximate

tree-building algorithm with a time complexity of O ( N 2 ), where

N is the number of sequences. At that time, this algorithm was

remarkably faster than those used by other programs. Then this

algorithm was subsequently adopted by MAFFT [ 39 ] and the

Clustal series [ 40 ]. MAFFT made a slight modification such that

the resulting tree is exactly identical to that by the standard

method. Due to this modification, the tree-building step is slightly

faster in MUSCLE than in MAFFT without the PartTree option.

4 Dot Plot

All the options in MAFFT assume that there are no genomic

rearrangements (translocations or inversions). By default, MAFFT

uses an algorithm to accelerate a group-to-group alignment calcu-

lation with the FFT algorithm [ 1 ]. It first finds highly conserved

regions and then aligns remaining regions using DP as shown in

Fig. 2 . Thus MAFFT can align long DNA sequences more effi-

ciently than normal DP, if a number of highly conserved regions

are found. Genomic rearrangements can result in conserved regions

that appear in an inconsistent order. In such a case, DP has to be

applied almost directly. It sometimes takes impractically long time,

and the result does not make sense.

To avoid such cases, the web version of MAFFT displays dot

plots between the first sequence and the remaining sequences,

using the LAST local alignment program [ 41 ], for every nucleotide

alignment run. By viewing the dot plots, a user can easily check for

genomic rearrangements and the directions of input sequences.

Some examples are shown in Fig. 4 . If a plot like d is returned

by the server, the calculation should be re-run with the “Adjust

direction” option (for the web version) or with the --adjust-

direction option (for the command-line version), as noted in the

next section. If a more complicated plot, like e, is returned, other

tools that assume genomic rearrangements should be applied,

4.1

Example

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home