Phylogeny-aware alignment with PRANK - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

This chapter focuses on evolutionary sequence alignment and

the use of a phylogeny-aware alignment algorithm to infer align-

ments for evolutionary studies. The definition of evolutionary

homology is central and we will start by discussing that and its

correct representation in multiple sequence alignments. We will

then introduce—with lots of figures and no equations—the

phylogeny-aware alignment algorithm implemented in PRANK .

After detailing the strengths and weaknesses of the method, we

will see what this means in practice and give advice for the use of

PRANK . We will finish with a brief discussion on the future plans for

PRANK and related methods.

In the following sections, some methods based on the classical

progressive algorithm are criticized and shown to perform poorly.

This criticism is based on their performance in evolutionary ana-

lyses only and, as demonstrated in other chapters of this topic, the

alignments they produce can be suitable for other types of analyses.

Similarly, the phylogeny-aware algorithm may perform poorly in

non-evolutionary alignment tasks and alternative methods should

be used.

2

Evolutionary Homology in Sequence Alignment

A multiple sequence alignment represents sitewise homology

among the characters in different sequences. The type of homology

denoted by the alignment depends on the application and the

intended use of the data: when the alignment is used for evolution-

ary analyses, the characters placed in the same column are believed

to be evolutionarily homologous and share a common ancestor.

Evolutionary homology is not the same as structural homology and

the difference between the two homology types is most clearly

evidenced by the role of insertions. Two independent insertions

at the same position can lead to identical changes in the structure

and the characters inserted independently may thus be considered

structurally homologous; in contrast, independent insertions—

even at exactly the same position—do not share a common ancestor

and can never be evolutionarily homologous. To correctly indicate

the evolutionary homology of insertions, the characters descending

from different insertions events should be placed in separate align-

ment columns (Fig. 1 ).

If one restricts the analysis to relatively short sequence frag-

ments, one can assume that the sequences evolve by substitutions,

insertions, and deletions only. The three processes can be assumed

to occur at relatively constant (although for different processes

distinct) rates, substitutions typically being at least an order of

magnitude more common than insertions and deletions [ 2 ]. The

three processes differ greatly in their effect on the sequences and on

one's ability to infer the events from the data: (1) a character at a

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home