Biology Reference
In-Depth Information
accurate MSAs is thus essential for these downstream analysis and
biological applications to be correct. However, computing an accu-
rate alignment is not a trivial task, the problem of accuracy
performing multiple comparisons being complex from a computa-
tional and a biological point of view.
From a computational point of view, estimating a correct mul-
tiple alignment has been shown to be a nondeterministic polyno-
mial (NP)-complete problem, even when using very simple
measurements such as sequence identity [ 3 ]. In practice it means
that rather than estimating an optimal alignment, one has to use
approximate model delivered by heuristic algorithm. The design of
such algorithms has been and is still an intense focus of interest [ 4 ]
and it might be argued that a majority of the available packages can
be described as alternative heuristics designed for optimizing simi-
lar objective functions. Amongst all currently available algorithm,
most of the new-generation aligners include a consistency-based
component similar to the one originally described in T-Coffee
(Tree-based consistency objective function for alignment evalua-
tion) [ 5 ]. Consistency-based aligners such as T-Coffee, despite
being slower than other algorithms, have also been shown to be
much more accurate. T-Coffee combines a consistency-based eval-
uation with fast standard assembly algorithm such as the progres-
sive alignment method used in ClustalW [ 6 ]. This combination
does not only yield alignments with a higher accuracy, but it also
results in a framework where methods, sequences, and structures
can be seamlessly combined and compared.
The biological issue is just as challenging. The main challenge is
the difficulty to quantify the correctness of an alignment. For
instance, if one assumes an evolutionary framework, a correct align-
ment can be defined as an alignment in which all residues
corresponding to the same residue in the ancestral sequence are
aligned to one another. Yet, estimating the evolutionary correctness
would require knowing in advance the relationship among residues,
something usually impossible. Likewise, if the alignment is computed
in a structural framework, a correct alignment will be defined as an
alignment where the aligned residues are all structurally homologous;
it would therefore require the knowledge of the structure of each of
the included sequences or a perfect understanding of the relationship
between structure and sequence. As a consequence MSAs are usually
estimated on the basis of sequence similarity, taking advantage of the
evolutionary inertia. This approach works reasonably well for closely
related sequences and it has been shown that structurally correct
alignment can easily be inferred for sequences having more
than 30 % identity; below this figure (the so-called twilight zone),
direct pairwise comparison becomes much less informative. None-
theless, MSA-based analysis can be used to align more distantly
related sequences, provided that highly conserved featured positions
can be used to estimate and validate the model. T-Coffee was
Search WWH ::




Custom Search