Biology Reference
In-Depth Information
their chains of
-carbon atoms, estimated by least squares over
translations and rotations of their respective 3D protein structures
(which are known a priori). A simple score is given by the root-
mean-square deviations between superposed
α
-carbon atoms,
whereas a more refined score also takes into account the orientation
of these atoms [ 48 ].
Two final aspects of structural benchmarks further complicate
their application in MSA assessment. The fact that reliable annota-
tions exist only for structurally conserved sequences means that
MSA of any region of the genome other than structured protein
coding regions—be it intronic, regulatory, natively disordered, or
simply poorly annotated—cannot be effectively assessed using
existing structural benchmarks [ 4 , 35 ]. This is particularly impor-
tant given that only a very small fraction of genome sequences
encode globular, folded protein domains, and that both structural
benchmarks and MSA tools focus mainly on alignment of this very
small portion of sequences. The current state of sequencing tech-
nologies also means that sequence data come with many artifacts
due to sequencing errors, short read length, and/or poor gene
prediction models [ 4 , 8 , 42 , 43 ] which are only very recently
starting to be accounted for in benchmarks [ 4 ].
Considering all these complications, it becomes apparent that
the map between structure and alignment is neither straightfor-
ward nor unequivocal. And indeed, by annotating known domains
in reference datasets (or estimating superfamilies when the domain
was unavailable), and then comparing annotation agreement in the
reference alignments by use of column scores, Edgar found incon-
sistencies in the assignment of aligned residues to specific secondary
structure in both PREFAB and BAliBASE [ 3 ].
α
5
Phylogenetic Tests of Alignment
Our last type of benchmark is phylogenetic tests of alignment.
Dessimoz and Gil [ 44 ] have recently introduced such tests, devel-
oping an MSA assessment pipeline that explicitly takes into consid-
eration phylogenetic relationships within the input sequence data
to evaluate the validity of alignment hypotheses generated by
different MSA methods.
This approach to benchmarking involves deriving alignments
of the test data from different MSA packages as the starting point
for tree building. The principle of the tests is simple: the more
accurate the resulting tree, the more accurate the underlying align-
ment is assumed to be. The quality of the tree is measured by its
compliance with an auxiliary principle or model; auxiliary in the
sense that the additional knowledge introduced be independent of
sequence data. So far, two methods have been devised. In the first,
referred to as the “species tree discordance test,” test alignments are
Search WWH ::




Custom Search