Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

all paired aligned residues are determined over all sequences in

every alignment. The overlap score for two alignments is calculated

by counting the aligned pairs present in both alignments,

and dividing by the average number of pairs in the alignments.

Hence, two almost identical alignments have an overlap score

close to one, while two very different alignments have an overlap

score close to zero. Two additional scores based on this concept are

the average overlap score, and the multiple overlap score. The

average overlap score is simply the mean of the overlap scores

measured over all pairs of input alignments, and represents the

difficulty of the alignment problem. The multiple overlap score is

a weighted sum of all pairs present in a single alignment, with the

weight determined by the number of times each pair appears in the

whole set of alignments. It is assumed that a high multiple overlap

score, gained by an alignment with a high proportion of commonly

observed pairs, corresponds to a good performance.

Another score that allows an internal control measure to esti-

mate the consistency of different aligners is the heads-or-tails

(HoT) score [ 27 ]. This consistency test is based on the assumption

that biological sequences do not have a particular direction, and

thus that alignments should be unaffected whether the input

sequences are given in the original or reversed order. The agree-

ment between the alignments obtained from the original and

reversed sequences can be quantified with the overlap measures

outlined above.

Both these consistency approaches—consistency among

aligners and HoT score—are attractive because they assume no

reference alignment or model of sequence of evolution, and thus

can be readily and easily employed. Furthermore, high consistency

is a necessary quality of a set of accurate aligners, thus making it

desirable. The consistency criterion also appeals to the intuitive idea

of “independent validation”—although most aligners have many

aspects in common and are thus hardly “independent.”

The biggest weakness of consistency is that it is no guarantee of

correctness: methods can be consistently wrong . More subtly,

consistency is sensitive to the choice of aligners in the set. This

can be partly mitigated by including as many different alignments as

possible [ 26 ]; nevertheless, it is easy to imagine cases where an

accurate alignment, outnumbered by inaccurate, but similar,

alignments, will be rated poorly. For instance, a new method solv-

ing a problem endemic to existing aligners will have low consistency

scores.

Likewise, while low HoT scores can be indicative of consider-

able alignment uncertainty, the converse is not necessarily true.

Hall reported that on simulated data at least, HoT scores tend to

overestimate alignment accuracy [ 28 ]. That being said, considering

the simplicity of HoT's scheme, the correlation Hall observed

between HoT and simulation-based measures of alignment

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home