Biology Reference
In-Depth Information
built from putative orthologous sequences, so that the resulting
test trees can be expected to have the same topology as the under-
lying species tree. Each resulting tree is then compared to a refer-
ence species tree, comprising sufficiently divergent species that its
branching order is deemed uncontroversial. The best performing
aligners are taken to be those that most consistently generate
alignments that yield test trees congruent with the species tree.
Indeed, it can be expected that averaged over many hundreds or
thousands of families, discordance due to non-orthology among
the input sequences will affect the performance of all aligners
equally, whereas discordance due to alignment error will vary
among aligners.
The second method, termed the “minimum duplication”
invokes a parsimony argument to interpret trees built from align-
ments of both orthologous and paralogous sequences, favoring trees
which require fewer gene duplications to explain the data as more
likely to reflect the true evolutionary history of the sequences.
One key advantage of phylogenetic benchmarks is that they
provide a way of evaluating gap-rich and variable regions, regions
for which structural benchmarks are often not applicable and simu-
lation benchmarks lack realism [ 44 ]. In particular, the limited
applicability of structural benchmarks to conserved protein core
regions has quite possibly caused developers of alignment methods
to focus their efforts on improving the performance of their tools
on conserved regions at the expense of gap-rich or variable regions.
Yet focusing on conserved regions can result in a loss of potentially
informative data for multiple sequence alignment [ 21 ]. Adopting
a simple tree inference method that looks only at presence or
absence of gaps as a binary character within a maximum parsimony
framework, Dessimoz and Gil reported that gap-only trees are
sometimes even more accurate than nucleotide-based trees, thus
highlighting the signal
lost in neglecting variable or gap-rich
regions [ 44 ].
At present, phylogeny-based benchmarks are the only ones that
can be interpreted to be directly evaluating homology on real data.
The premise of this interpretation is that more accurate trees on
average necessarily ensue from a higher proportion of homologous
positions in alignments on average, and therefore that the former is
a good surrogate for the latter. Yet although we view the premise as
highly plausible (and indeed fail to see how one could argue the
opposite), there is no proof for it. If dismissed altogether, the
interpretation has to be weakened so that these phylogeny tests
only measure the effect of alignment on phylogenetic inference.
In this case, phylogeny-based benchmarks are less meaningful even
for other homology-based applications of alignments, such as
detecting sites under positive selection [ 45 ].
Search WWH ::




Custom Search