Biology Reference
In-Depth Information
6 Conclusions
Benchmarks for MSA applications have arisen in recent years as a
crucial tool for bioinformaticians to keep a critical eye on existing
software packages and reliably diagnose areas that need further
development. The implementation of benchmarks to routinely
assess the efficacy and accuracy of MSA methods has clearly
provided important insights, and has pointed out to the developer
community very serious shortcomings of existing methods that
would not otherwise have been so apparent [ 4 , 26 , 44 , 46 ]. Each
benchmarking solution examined in this chapter—whether
simulation-, consistency-, structure-, or phylogeny-based—entails
risks of bias and error, but each is also useful in its own right when
applied to a relevant problem. It is interesting to note that simula-
tion benchmarks rank MSA methods differently from empirical
benchmarks [ 21 , 46 , 47 ]. It is clear that no single benchmark can
be uniformly used to test different MSA methods. Instead, due to
both the computational and biological issues raised by the problem
of sequence alignment optimization, a multiplicity of scenarios
need to be modelled in benchmark datasets.
A telling symptom of the current state of affairs is the fact that
subjective manual editing of sequence alignments remains wide-
spread, reflecting perhaps an overall lack of confidence in the per-
formance of automated multiple alignment strategies. The criteria
used when editing sequence alignments “by eye” are vague and may
introduce individual biases and aesthetic considerations
into
sequence alignment [ 9 , 21 ].
In order to ensure reproducibility of experimental results, one
of the most important goals of scientific practice, this trend needs
to change. Context-specific, effective benchmarking with well-
defined objectives represents a sensible way forward.
Acknowledgments
The authors thank Julie Thompson for helpful feedback on the
manuscript. CD is supported by SNSF advanced researcher fellow-
ship #136461. This article started as assignment for the graduate
course “Reviews in Computational Biology” at the Cambridge
Computational Biology Institute, University of Cambridge.
References
1. Kemena C, Notredame C (2009) Upcoming
challenges for multiple sequence alignment
methods in the high-throughput era. Bioinfor-
matics 25(19):2455-2465
2. Aniba MR, Poch O, Thompson JD (2010)
Issues in bioinformatics benchmarking: the
case study of multiple sequence alignment.
Nucleic Acids Res 38(21):7353-7363
 
Search WWH ::




Custom Search