Large-Scale Multiple Sequence Alignment and Tree Estimation Using SATe - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

“sequence.fasta”, the PartTree algorithm can be invoked using

the following command: mafft -parttree -retree 2 -partsize 1000

sequence.fasta

startingAlignment.fasta. The command to run

Clustal Omega is: clustalo -auto -dealign -i sequence.fasta

>

startingAlignment.fasta. Once you have the alignment, you can

provide this to SAT´ as the initial alignment (see above).

4. In the “External Tools” window, choose the following software

settings: “MAFFT” for the “Aligner” dropbox, “Muscle” for

the “Merger” dropbox, and “FastTree” for the “Tree Estima-

tor” dropbox. For nucleotide analyses, select “GTR + CAT”

for the “Model” dropbox, and for protein analyses, select

JTT + CAT.

5. In the “Sequences and Tree” window, provide your initial

alignment (if available), and click on “initial alignment

(use for initial tree)”. Follow from step 3 in Subheading 8.6 .

6. In Workflow Settings, do not select “Extra RAxML Search”,

unless your dataset is not particularly big-the final RAxML

search could be the most computationally intensive part of

your analysis, and may not provide substantial benefits.

7. In the “Job Settings” window, make sure you provide the

number of CPU(s) available (this will have a large impact on

the running time, if more than 1 CPU can be used in the

analysis). Also make sure that the “Max. Memory (MB)” dialog

specifies the correct amount of available memory, since mem-

ory limitations are often a problem that cause running times to

increase. See Note 7 .

8. In the “SAT´ settings” window, you can use Quick Set to select

“SAT´-II-fast”; this will set all the settings appropriately. Alter-

natively, you can modify the settings as follows. Select the

“Size” radio button in the “Max. Subproblem” field and a

size of 200 in the dropdown menu. Set the decomposition to

“centroid” (because using “Longest” will not only slow down

the analysis, but also should only be run with Opal, and Opal

should not be run with large datasets). Set the “Apply Stop

Rule” to either “After Launch” (for very large datasets) or to

“After Last Improvement”. Do not select “Blind Mode

Enabled” if your dataset is very large. It is also probably not a

good idea to use a time limit for the stopping rule if your

dataset is very large, since it is possible for a single iteration to

not complete in the time you pick. Therefore, we recommend

instead picking an iteration limit. The number of iterations you

pick should depend on your dataset, but for very large datasets,

it may be best to have a small number (say, 2) of iterations.

If these complete quickly, you can always use the output align-

ment and tree to initialize another SAT´ run! We recommend

setting “Return” to “Best”.

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home