Biology Reference
In-Depth Information
“Tree Estimator” method . Only RAxML and FastTree are enabled
for estimating trees from alignments, and FastTree is the default.
Both are heuristics for maximum likelihood, which is a computa-
tionally hard problem. FastTree is much faster than RAxML, and
generally produces trees of very similar accuracy [ 34 ]. Further-
more, in our unpublished studies, the use of FastTree instead of
RAxML within SAT ´ produces alignments of comparable accuracy
and only a small decrease in accuracy for the trees. Because of its
great speed advantage, however, we recommend the use of Fast-
Tree. If FastTree is used, a final RAxML run can be applied to the
output alignment in order to obtain a RAxML tree (and thus
potentially improved accuracy).
Substitution model . This refers to the statistical model [ 29 ] used by
the maximum likelihood method (RAxML or FastTree) to estimate
trees from alignments. The choice of statistical model depends on
whether your data are nucleotide or amino-acid sequences, and also
on whether you are using RAxML or FastTree as the tree estimator,
since these enable somewhat different models. For nucleotide data,
the default using RAxML is GTRCAT, while the default using
FastTree is GTR + G20. GTR stands for the General Time Revers-
ible (GTR) model, which is the most general substitution model
available within SAT´. G20 and CAT refer to how the model
handles the Gamma rates-across-sites model; G20 is the GAMMA
distribution approximated by 20 rate categories, while CAT [ 35 ]is
a heuristic approximation to the GAMMA rate-variation model.
Alternative settings for RAxML include GTRGAMMA (GTR +
GAMMA) and GTRGAMMAI (GTR + Gamma + Invariable).
Alternative settings for FastTree include JC (the Jukes-Cantor
model) [ 36 ] instead of GTR, but this simplified model is not
recommended except under very unusual circumstances where
the data seem to fit the Jukes-Cantor model best (unlikely for
most data). Note that the GAMMA setting is usually used in
phylogenetic analyses, but the CAT setting improves speed at a
potential loss of phylogenetic accuracy. For amino-acid datasets,
the choice of substitution model is more complicated; see the
section below on Amino-Acid Datasets for more information.
Maximum subproblem size . This is the maximum allowed size of the
subsets of sequences, and so determines how many times the
decomposition strategy is applied. The default depends on the
dataset size (and will be set by SAT ´ after you input your data).
However, the main issue in setting the maximum subproblem size
is the method used to align subsets. When MAFFT is the aligner
method, then keeping the maximum subproblem size to at most
200 allows the most accurate version of MAFFT (L-INS-i) to be
used to align the subsets, and this results in the best accuracy. If you
wish to use Prank instead of MAFFT to align subsets, the maximum
subproblem size should be reduced substantially, because Prank is
Search WWH ::




Custom Search