MAFFT: Iterative Refinement and Additional Methods - Multiple Sequence Alignment Methods - page 132

Biology Reference

In-Depth Information

similarity score between forward-forward comparison and

forward-reverse comparison can be too small to judge the direction.

To give a more stable result, the current version of MAFFT uses

the following procedure to determine the direction of each

sequence. Suppose that the n input sequences are numbered from

0to n

1. For sequence i ( i =1to n

1) other than the first

sequence,

1. Calculate the similarity scores, S f ( j ), between sequence i and

sequences j ( j =0to i

1).

2. Calculate the similarity scores, S r ( j ), between the reverse com-

plement of sequence i and sequences j ( j =0to i

1).

max j ( S r ( j )), then sequence i is replaced with

its reverse complement.

This procedure requires O ( n 2 ) comparisons and is slow when

the scores are calculated with DP. However, when the scores are

rapidly calculated based on the number of shared 6mers, the speed

is practical.

To run this calculation on the command line, use

3. If max j ( S f ( j ))

<

which computes the distances based on the number of shared

6mers. The slower but more exact calculation based on DP can be

selected with

Our preliminary assessment based on computer simulation showed

that the difference between these two options is small unless the

input sequences are highly divergent and short. Thus the --

adjustdirection option is recommended in most cases.

6 Adding Unaligned Sequences into an MSA

The need for MSAs with a large number of sequences is increasing,

as a result of advances in sequencing technologies. There are several

different approaches to enable larger MSAs, e.g., rapid algorithms,

and parallelization. MAFFT [ 1 , 39 , 42 ] and many other programs

were recently developed or extended by incorporating these

advances. In our opinion, another promising approach for large

MSAs is the use of an existing alignment. A relatively small number

of sequences have been carefully aligned and annotated in databases,

e.g., [ 43 - 45 ]. Sometimes we align newly sequenced data into an

existingMSA taken from such a database. This is more efficient than

rebuilding the entire MSA from a set of ungapped sequences.

Next Page

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home