Biology Reference
In-Depth Information
l Consistency-Based Methods: Constructs a database of local
and global alignments to find a final alignment (e.g. T-Coffee,
DiAlign, ProbCons)
Structure-Based Methods: Utilizes external knowledge such as
protein structure (e.g. 3D-Coffee)
l
In heuristic implementation of MSA algorithms, the most
widely used approach is the SP scoring scheme. The main idea is
to progressively combine induced pairwise alignments to obtain the
final MSA. Here, we briefly review a few such algorithms and
indicate specific improvements the programs offer.
ClustalW is one of the earliest multiple sequence alignment pro-
grams and it is still widely used. It has three main steps. First, it
starts by pairwise alignment of all pairs of sequences via global
dynamic programming with a plausible scoring function. Second,
it uses the pairwise alignment scores to build a phylogenetic tree
employing the Neighbor-Joining algorithm. Finally, the sequences
are aligned starting from leaves to the root and, as a result, the MSA
of all sequences are obtained [ 10 ].
In the tree reversal process, highest scoring pairs are progres-
sively combined. On the other hand, as new sequences are intro-
duced to the MSA, initial alignment structures propagate. This
introduces a twofold problem: the greediness of the progressive
alignment approach calls for the possibility of staying at a local
minimum as far as the overall MSA score is concerned and errors
in the early alignments cannot be rectified. Therefore, CLUSTALW
designates weights for sequences to overcome the aforementioned
problems. If there exists an edge of length l and n i 2
3.1 ClustalW
n that can be
reached by traversing l , then the designated weight to n i from this
edge is
n
l . This way, tree reversal is not solely dependent on the tree
structure but also benefits from the distance between sequences
based on pairwise alignment scores. One of the main improvements
offered by CLUSTALW is the appropriate parameter value selec-
tion for scores involving gaps. As the protein core has less insertions
and deletions, CLUSTALW considers short stretches of hydro-
philic residues (e.g. 5 or more) as an indication of loops or random
coil regions and reduces the gap opening penalty for these
stretches. Besides, it increases the gap opening penalty for gaps
that are less than eight residues apart based on the observation of
alignments between sequences of known structures, where it is rare
to find gaps within 8-residue segments [ 11 ]. The initial gap open-
ing penalty and the extended gap penalty are defined as follows:
GOP
þ
log
ð
min
ð
N
;
M
ÞÞ
S
ð
a
;
b
Þ
ISF
log N
M
GEP
1
:
0
þ
Search WWH ::




Custom Search