Biology Reference
In-Depth Information
These types of methods were intensively studied recently, and
many alternative methods, such as PicXAA-RNA [ 5 ], CentroidA-
lign [ 36 ], and RCoffee [ 37 ], are available.
MAFFT has a subprogram to align two alignments.
2.5 Profile
Alignments
This program is useful only when two alignments are phylo-
genetically separated. Careless application of this method results in
serious misalignments, as shown in [ 38 ] and Subheading 6 .
We are preparing a safer option, --addprofile , to avoid such
mistakes.
This option does not return any result if the sequences in
alignment1 do not form a monophyletic cluster. Thus this method
is not always useful for every user and is still in the testing phase.
To align a large number of sequences, MAFFT has an approximate
option, PartTree [ 39 ], which skips the calculation of the full dis-
tance matrix consisting of O ( N 2 ) elements, where N is the number
of sequences. Instead, n sequences are randomly selected and the
distances between the n sequences and the remaining sequences
are computed to classify the sequences into n groups. The n groups
are recursively subjected to the same process, to create a tree-like
classification. The time complexity of this processes is O ( N log N ).
There are several subtypes of the PartTree option. The fastest one is
2.6 MSA of a Large
Number of Sequences
in which distances are computed based on the number of shared
6mers. A more accurate subtype is also available.
in which distances are computed based on DP. The application of
DP to a large dataset might seem to be impractical, but as a result
of the PartTree algorithm, we can drastically restrict the number of
DP runs. Accordingly, this option is feasible and gives slightly
better accuracy than the 6mer-based option in our tests. See [ 39 ]
for details. The latest version of the Clustal series, Clustal Omega
[ 6 ], provides an alternative method for large MSA, using the mBed
algorithm [ 40 ].
Search WWH ::




Custom Search