Biology Reference
In-Depth Information
all paired aligned residues are determined over all sequences in
every alignment. The overlap score for two alignments is calculated
by counting the aligned pairs present in both alignments,
and dividing by the average number of pairs in the alignments.
Hence, two almost identical alignments have an overlap score
close to one, while two very different alignments have an overlap
score close to zero. Two additional scores based on this concept are
the average overlap score, and the multiple overlap score. The
average overlap score is simply the mean of the overlap scores
measured over all pairs of input alignments, and represents the
difficulty of the alignment problem. The multiple overlap score is
a weighted sum of all pairs present in a single alignment, with the
weight determined by the number of times each pair appears in the
whole set of alignments. It is assumed that a high multiple overlap
score, gained by an alignment with a high proportion of commonly
observed pairs, corresponds to a good performance.
Another score that allows an internal control measure to esti-
mate the consistency of different aligners is the heads-or-tails
(HoT) score [ 27 ]. This consistency test is based on the assumption
that biological sequences do not have a particular direction, and
thus that alignments should be unaffected whether the input
sequences are given in the original or reversed order. The agree-
ment between the alignments obtained from the original and
reversed sequences can be quantified with the overlap measures
outlined above.
Both these consistency approaches—consistency among
aligners and HoT score—are attractive because they assume no
reference alignment or model of sequence of evolution, and thus
can be readily and easily employed. Furthermore, high consistency
is a necessary quality of a set of accurate aligners, thus making it
desirable. The consistency criterion also appeals to the intuitive idea
of “independent validation”—although most aligners have many
aspects in common and are thus hardly “independent.”
The biggest weakness of consistency is that it is no guarantee of
correctness: methods can be consistently wrong . More subtly,
consistency is sensitive to the choice of aligners in the set. This
can be partly mitigated by including as many different alignments as
possible [ 26 ]; nevertheless, it is easy to imagine cases where an
accurate alignment, outnumbered by inaccurate, but similar,
alignments, will be rated poorly. For instance, a new method solv-
ing a problem endemic to existing aligners will have low consistency
scores.
Likewise, while low HoT scores can be indicative of consider-
able alignment uncertainty, the converse is not necessarily true.
Hall reported that on simulated data at least, HoT scores tend to
overestimate alignment accuracy [ 28 ]. That being said, considering
the simplicity of HoT's scheme, the correlation Hall observed
between HoT and simulation-based measures of alignment
Search WWH ::




Custom Search