Biology Reference
In-Depth Information
These types of methods were intensively studied recently, and
many alternative methods, such as PicXAA-RNA [
5
], CentroidA-
lign [
36
], and RCoffee [
37
], are available.
MAFFT has a subprogram to align two alignments.
2.5 Profile
Alignments
This program is useful only when two alignments are phylo-
genetically separated. Careless application of this method results in
serious misalignments, as shown in [
38
] and Subheading
6
.
We are preparing a safer option,
--addprofile
, to avoid such
mistakes.
This option does not return any result if the sequences in
alignment1 do not form a monophyletic cluster. Thus this method
is not always useful for every user and is still in the testing phase.
To align a large number of sequences, MAFFT has an approximate
option, PartTree [
39
], which skips the calculation of the full dis-
tance matrix consisting of
O
(
N
2
) elements, where
N
is the number
of sequences. Instead,
n
sequences are randomly selected and the
distances between the
n
sequences and the remaining sequences
are computed to classify the sequences into
n
groups. The
n
groups
are recursively subjected to the same process, to create a tree-like
classification. The time complexity of this processes is
O
(
N
log
N
).
There are several subtypes of the PartTree option. The fastest one is
2.6 MSA of a Large
Number of Sequences
in which distances are computed based on the number of shared
6mers. A more accurate subtype is also available.
in which distances are computed based on DP. The application of
DP to a large dataset might seem to be impractical, but as a result
of the PartTree algorithm, we can drastically restrict the number of
DP runs. Accordingly, this option is feasible and gives slightly
better accuracy than the 6mer-based option in our tests.
See
[
39
]
for details. The latest version of the Clustal series, Clustal Omega
[
6
], provides an alternative method for large MSA, using the mBed
algorithm [
40
].