Phylogeny-aware alignment with PRANK - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

algorithm [ 5 ]. If the alignment guide phylogeny is likely to contain

errors or the input sequences are incomplete (i.e., contain missing

data), the option +F can be problematic and the resulting align-

ment should at least be compared to one produced without it.

Reproducibility: Most pairwise alignments have several equally

good solutions. In progressive alignment, the choice between

these alternative solutions may trigger larger changes in the later

stages of the process and lead to very different multiple alignments.

Most alignment methods are deterministic and always pick the

same solution and thus guarantee to produce the same final align-

ment. This practice hides the uncertainty in the data and has led to

post-processing methods to recover the hidden variation [ 13 ]. By

default, PRANK picks randomly one of the alternative solutions and

may produce different results on independent runs of the very same

data. This behavior may be disabled if reproducibility is required.

Sequence alphabet: PRANK represents sites at ancestral sequences with

vectors of conditional likelihoods for the descendant sub-tree given

different character states at the parent. This requires O ( A 2 ) com-

putations for each cell in the dynamic programming matrix, where

A is the size of the character alphabet, and makes the alignment of

sequences with a large alphabet relatively slow. For protein-coding

sequences, the alignments performed on codon level has been

shown to outperform those done on protein sequences [ 10 , 11 ].

Despite its slower computation, the use of codon alignment is

recommended whenever possible. In general, protein-coding

DNA sequences should not be aligned as DNA without good

reason. If codon alignment is found to be too slow, PRANK provides

an option to translate protein-coding DNA sequences to protein,

perform the alignment on protein sequences, and back-translate

the resulting alignment to DNA.

Sequence sampling: Given that the alignment guide phylogeny is

correct and the sequence sampling is dense, PRANK is unbiased and

scales up to any number of sequences. Even if the question in hand

would not require an alignment of a large number of sequences, the

quality of the resulting alignment is expected to be better when it is

performed for many closely-related sequences than for a small

number of distantly-related ones. Unneeded sequences can be

removed after the alignment without affecting the statement of

homology among the remaining sequences. PRANK is not suitable

for the alignment of highly diverged sequences.

6

Future Directions

PRANK has been shown to perform well in benchmarks assessing the

suitability of sequence alignments generated with various methods to

different types of evolutionary analyses [ 10 - 12 ]. Despite its good

performance in phylogenetic analyses, the method should be used

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home