Biology Reference
In-Depth Information
1. Prepare your dataset by creating a new folder and saving the
sequence data for each gene (or marker) in a separate FASTA-
formatted file in the new folder. Each FASTA-formatted file
name must end with the suffix .fasta or .fas. Make sure that the
set of taxon names are identical across all of the FASTA files.
2. Begin by following step 1 from the “Basic Analysis” section.
3. Click the “Multi-Locus Data” checkbox in the “Sequences and
Tree” pane. Notice that the “Sequence file” dialog changes
into the “Sequence files” dialog. Click the “Sequence files”
button and choose the folder containing the input files.
4. Now run the analysis by following steps 3 through 9 in the
“Basic Analysis” section.
5. After the analysis finishes, the output files will be saved to the
output directory. The file names and descriptions will match
Table 2 , with one exception. For an analysis with job name
“myjob” and input files named “geneA.fasta”, “geneB.fasta”,
“geneC.fasta”, and so on, SAT´ saves the output alignments in
files named myjob.marker001.geneA.aln, myjob.marker002.
geneB.fasta.aln, myjob.marker003.geneC.fasta.aln, and so on.
8.8 Advanced
Analysis: Multi-gene
Datasets
9
Summary and Related Work
SAT´ is a method for large-scale alignment and tree estimation that
has been shown to give very good results on both biological and
simulated datasets of both nucleotide and amino-acid datasets.
However, the reasons for its good performance are subtle: for
example, it is not the case that allowing the alignment to change
arbitrarily and seeking the alignment with the best maximum likeli-
hood score (treating gaps as missing data) will lead to good trees
[ 75 ]. Instead, the benefits to using SAT´ come because alignment
methods with great accuracy but poor scalability can be used to
estimate alignments on small subsets of the sequence dataset, and
the resultant subset alignments can then be merged into an align-
ment on the full dataset. This design strategy means that SAT´ can
continue to improve in accuracy as new alignment methods are
developed. Similarly, as better tree estimation methods are devel-
oped (including ones that might use gap events in a more informa-
tive manner), SAT´ can continue to improve in accuracy and/or
scalability though the incorporation of these improved methods.
Alternative approaches to large-scale phylogeny estimation that
do not require the estimation of a multiple sequence alignment
have also been developed; of these, DACTAL [ 14 ] has been shown
to give results that are almost as accurate as SAT´, while being able
to run on very large datasets. However, DACTAL is not completely
alignment-free; instead, it computes alignments and trees on small
Search WWH ::




Custom Search