Biology Reference
In-Depth Information
algorithm [ 5 ]. If the alignment guide phylogeny is likely to contain
errors or the input sequences are incomplete (i.e., contain missing
data), the option +F can be problematic and the resulting align-
ment should at least be compared to one produced without it.
Reproducibility: Most pairwise alignments have several equally
good solutions. In progressive alignment, the choice between
these alternative solutions may trigger larger changes in the later
stages of the process and lead to very different multiple alignments.
Most alignment methods are deterministic and always pick the
same solution and thus guarantee to produce the same final align-
ment. This practice hides the uncertainty in the data and has led to
post-processing methods to recover the hidden variation [ 13 ]. By
default, PRANK picks randomly one of the alternative solutions and
may produce different results on independent runs of the very same
data. This behavior may be disabled if reproducibility is required.
Sequence alphabet: PRANK represents sites at ancestral sequences with
vectors of conditional likelihoods for the descendant sub-tree given
different character states at the parent. This requires O ( A 2 ) com-
putations for each cell in the dynamic programming matrix, where
A is the size of the character alphabet, and makes the alignment of
sequences with a large alphabet relatively slow. For protein-coding
sequences, the alignments performed on codon level has been
shown to outperform those done on protein sequences [ 10 , 11 ].
Despite its slower computation, the use of codon alignment is
recommended whenever possible. In general, protein-coding
DNA sequences should not be aligned as DNA without good
reason. If codon alignment is found to be too slow, PRANK provides
an option to translate protein-coding DNA sequences to protein,
perform the alignment on protein sequences, and back-translate
the resulting alignment to DNA.
Sequence sampling: Given that the alignment guide phylogeny is
correct and the sequence sampling is dense, PRANK is unbiased and
scales up to any number of sequences. Even if the question in hand
would not require an alignment of a large number of sequences, the
quality of the resulting alignment is expected to be better when it is
performed for many closely-related sequences than for a small
number of distantly-related ones. Unneeded sequences can be
removed after the alignment without affecting the statement of
homology among the remaining sequences. PRANK is not suitable
for the alignment of highly diverged sequences.
6
Future Directions
PRANK has been shown to perform well in benchmarks assessing the
suitability of sequence alignments generated with various methods to
different types of evolutionary analyses [ 10 - 12 ]. Despite its good
performance in phylogenetic analyses, the method should be used
Search WWH ::




Custom Search