Large-Scale Multiple Sequence Alignment and Tree Estimation Using SATe - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

Table 1

(continued)

Algorithmic

parameter

Software setting

choices

Software setting

Description

This determines whether SAT ´

will be run in parallel mode

Parallelization

“CPU(s) Available”

1-16

Multi-gene

analysis

“Multi-Locus Data”

checkbox/”Sequence

files” button

Checked/unchecked

Folder dialog box

This enables a multi-gene

analysis. See the “Advanced

Analysis” section

Checking this makes SAT ´

perform a RAxML analysis of

the final alignment

Miscellaneous

algorithmic

modifications

“Extra RAxML Search”

checkbox

Checked/unchecked

Miscellaneous

algorithmic

modifications

“Two-Phase

(not SATe)” checkbox

Checked/unchecked

Check to run a two-phase

analysis (first align and then

compute an ML tree)

Choosing one of the settings in the “Quick Set” dropbox will automatically configure the software settings to perform

one of the SAT ´ -II analyses described in ref. 23 . Subsequent modifications to software settings will cause the “Quick Set”

dropbox to display the “(Custom)” choice

5 Additional Guidelines for Selecting Algorithmic Parameters

“Aligner” method . The choice of method to align the subsets has a

large impact on the resultant alignment and tree. The default is

MAFFT, due to its high accuracy on both simulated and biological

data on both nucleotides and amino acid datasets [ 2 , 3 , 13 , 14 ,

23 , 24 ]. However, Prank has also been used in studies [ 24 ], and has

the advantage over MAFFT and other standard alignment methods

of not “over-aligning” as much. Because Prank is slower than

MAFFT, the use of Prank to align subsets should be accompanied

by a reduction in the maximum subset size so that the runs can

complete. Finally, Opal and ClustalW are also enabled. Opal pre-

sents memory challenges on large datasets, and is not recom-

mended unless the dataset is small enough. ClustalW is fast and

can be used on any dataset size, but may not provide the same

accuracy as MAFFT.

“Merger” method . Only Muscle and Opal are enabled for merging

alignments. Muscle is the current default, because it has low mem-

ory requirements while Opal has high memory requirements.

However, we strongly recommend Opal because it generally pro-

duces more accurate alignments. Therefore, we recommend using

Opal unless you do not have sufficient memory for your dataset

analysis. However, this is unlikely to be a problem except for very

large datasets (with more than 10,000 sequences), if you have a

reasonable amount of memory on your laptop or desktop machine.

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home