Biology Reference
In-Depth Information
similarity score between forward-forward comparison and
forward-reverse comparison can be too small to judge the direction.
To give a more stable result, the current version of MAFFT uses
the following procedure to determine the direction of each
sequence. Suppose that the n input sequences are numbered from
0to n
1. For sequence i ( i =1to n
1) other than the first
sequence,
1. Calculate the similarity scores, S f ( j ), between sequence i and
sequences j ( j =0to i
1).
2. Calculate the similarity scores, S r ( j ), between the reverse com-
plement of sequence i and sequences j ( j =0to i
1).
max j ( S r ( j )), then sequence i is replaced with
its reverse complement.
This procedure requires O ( n 2 ) comparisons and is slow when
the scores are calculated with DP. However, when the scores are
rapidly calculated based on the number of shared 6mers, the speed
is practical.
To run this calculation on the command line, use
3. If max j ( S f ( j ))
<
which computes the distances based on the number of shared
6mers. The slower but more exact calculation based on DP can be
selected with
Our preliminary assessment based on computer simulation showed
that the difference between these two options is small unless the
input sequences are highly divergent and short. Thus the --
adjustdirection option is recommended in most cases.
6 Adding Unaligned Sequences into an MSA
The need for MSAs with a large number of sequences is increasing,
as a result of advances in sequencing technologies. There are several
different approaches to enable larger MSAs, e.g., rapid algorithms,
and parallelization. MAFFT [ 1 , 39 , 42 ] and many other programs
were recently developed or extended by incorporating these
advances. In our opinion, another promising approach for large
MSAs is the use of an existing alignment. A relatively small number
of sequences have been carefully aligned and annotated in databases,
e.g., [ 43 - 45 ]. Sometimes we align newly sequenced data into an
existingMSA taken from such a database. This is more efficient than
rebuilding the entire MSA from a set of ungapped sequences.
 
Search WWH ::




Custom Search