Biology Reference
In-Depth Information
appreciation of the importance of benchmarking measures and
datasets to evaluate and critically examine the performance of dif-
ferent MSA software packages, as underscored by a number of
recent articles addressing the subject [ 1 - 5 ].
At the same time, and despite these positive developments, the
standard approach adopted by the great majority of scientists deal-
ing with sequence alignment has remained reliance on aligners that
have long been outperformed in benchmarks [ 6 ], or even manual
and therefore inevitably subjective intervention in the alignment
process [ 7 ]. It is unclear whether this is due to the simplicity of use
and convenience of long-standing aligners (“historical inertia” [ 7 ]),
reluctance to move away from customary practice, or unawareness
or even distrust of newer, lesser-tested technologies. This trend is
particularly worrying in light of the rapid spread of high-throughput
technologies and the associated need for automation of analysis
pipelines [ 8 ]. A reason for this state of affairs might lie upon the
absence of a straightforward alignment benchmarking procedure
and interpretation. In this chapter, we contribute to overcoming
this problem by reviewing present alignment benchmarks, aiming
to clarify their strengths and risks for MSA evaluation with a view
towards having better (and better-trusted) benchmarks in the
future. But before considering benchmarking strategies, we first
need to review the alignment objectives we expect them to gauge.
Aconceptual complication lies in the fact thatMSAs havemultiple and
potentially conflicting goals, depending on the biological question of
interest [ 9 ]. Commonly, the residues aligned are those inferred to be
related through homology, i.e., common ancestry. In other contexts,
however, the emphasis might be more on functional or structural
concordance among residues. A strictly evolutionary interpretation
of homology in these cases could be counterproductive, as recog-
nized also by Kemena and Notredame [ 1 ], since regions of the
protein that carry out the same function or that occupy the same
position in the three-dimensional conformation of the protein may
have arisen independently by evolutionary convergence. For exam-
ple, an alignment that pairs structurally analogous, but nonhomolo-
gous, residues would be informative and therefore “correct” to the
structural biologist, although not so to the phylogeneticist. It
should however be noted that functional and structural objectives
are considerably less precise than the evolutionary objective: while
common ancestry is an absolute, binary attribute, similarity in func-
tional or structural role are context-dependent, continuous attri-
butes, thus rendering any reduction to the aligned/unaligned
dichotomy subjective at best, ill-defined at worst.
At the same time, the unambiguous nature of the evolutionary
objective does not make it automatically easy to pursue (or, as we
shall see below, ascertain). Indeed, the evolutionary history of
biological sequences is mostly unknown and can only be inferred
1.1 What Should
Sequence Aligners
Strive for?
Search WWH ::




Custom Search