Biology Reference
In-Depth Information
Even if the mathematically optimal result has been calculated,
it may not be correct. Unsatisfactory results are often obtained
if large portions of a gene are missing in some of the sequences
(the “?” in Fig. 2). The insertion or deletion of an entire protein
domain can lead to prohibitively large penalties which would not
allow the correct alignment to be found. This is another reason for
manually reviewing and correcting alignments.
Insertions and deletions are not the only evolutionary events that
take place. Complex rearrangements, more akin to macromuta-
tional events, also happen, but these are rarely taken into account
by multiple-alignment software.
The sheer complexity and variability of genetic information and
biological mechanisms can lead to further problems. These
include synonymous codons and underlying protein sequence
in coding regions; pseudogenes; mRNA editing; splicing
boundaries in eukaryotic genes; regulatory regions such as pro-
moters; secondary and tertiary structures in proteins, rRNA,
and tRNA, etc. Each dataset has its own supplementary
constraints that can be leveraged to improve alignment, but
one tool cannot do it all.
Once a multiple alignment has been created, a second step is
performed to select the columns of the alignment that are to be included
in the phylogenetic analysis. One must be certain that the characters are
rich in information and that the alignment is reliable. Since there will
always be a mathematically optimal alignment, even for purely random
sequences, it is necessary to take some preliminary precautions.
Generally, only molecular sequences that are clearly identifiable and have
roughly the same length are used. Regions coding for proteins fit the bill
perfectly, especially if a secondary, tertiary, or even quaternary structure
is known and can be used to improve homology assumptions. This allows
for a functional verification of alignment results.
Several approaches can be used to approximate the information con-
tent of the selected characters. A simple one is to perform an exhaustive
Search WWH ::




Custom Search