Biology Reference
In-Depth Information
A major issue with MSAs is accuracy estimation. It is well docu-
mented that MSA packages deliver significantly different align-
ments of the same dataset and thus can affect the results obtained
from any downstream analysis. The comparison and evaluation of
MSA is therefore an essential step in order to assess its quality and
reliability. Currently, aligners are compared through external refer-
ence alignment expert-curated or benchmarks. However, it has
been shown that no aligner performs better than all others on
every available benchmark. T-Coffee provides three different scor-
ing systems, allowing the comparison of alternative alignments,
independently from such reference alignments.
3.3 Evaluating
Alignments
1. Sequence only consistency-based accuracy evaluation: Consis-
tency of Overall Residue Evaluation (CORE) index.
The CORE index is one of the most versatile tools developed to
display the agreement between a set of alignments and a given
model. The CORE index is directly based on T-Coffee consis-
tency estimation scheme. Every aligned residue is colored
according to its consistency score (red for high and blue for
low). This normalized score reflects the agreement between the
actual alignment of the residue (column) and the alternative
alignments. It can be displayed by running the following com-
mand:
t_coffee sh3.fasta -method mafft_pair clustalw2_pair
proba_pair poa_pair -output score_html, score_ascii
In this example, the scores are displayed in ascii format and
can be visualized using the corresponding html file. In this
particular case, it is a measure of the agreement between the
four considered methods mafft, clustalw, tcoffee, and poa and
the final alignment. The CORE index is only meaningful for
(1) datasets containing at least four sequences when running
single aligner and (2) alignments combining at least three
methods when using the M-Coffee mode.
2. Single-structure accuracy evaluation: STRIKE.
The STRIKE [ 17 ] score is the latest scoring system developed
within the T-Coffee package to assess and identify the most
accurate MSA amongst alternative MSAs of the same sequence
dataset. To assess protein MSA, the use of structure is often
considered as a gold standard; however, such information is
often not available or in low abundance. STRIKE's only
requirement is a single homologous 3D structure to evaluate
and rank alternative alignments of a given dataset. MSA accu-
racy is computed using a contact matrix estimated through
residue-residue contact in a dataset of nonredundant high-
quality protein structures from the ASTRAL database. STRIKE
can be run using the following command:
Search WWH ::




Custom Search