Biology Reference
In-Depth Information
complexity of protein structure is reflected in the algorithms used to
align them. Structural alignment methods are generally slow com-
pared with sequence alignment methods, so any effort to combine
the two must weigh the costs of such integration against the benefits.
There are also conceptual questions that need to be addressed.
The biggest one is how to incorporate structural information into
MSA calculations. A structural alignment is generally at least as
accurate as a sequence alignment. However, not all parts of the
alignment are equally reliable. For example, as a general rule “core”
residues will align better than residues close to the molecular sur-
face. When importing structural alignments into MSA calculations,
we need a way of describing such variations in alignment quality.
Below we will describe a particular structural alignment package,
ASH, and discuss MAFFT-ASH integration at a conceptual level.
ASH [ 53 ] is a pairwise protein structural alignment program that is
based on the double dynamic programming (DDP) algorithm
originally proposed by Orengo and Taylor [ 54 , 55 ] and extended
by Toh [ 56 ]. The source code of ASH is available from the Protein
Data Bank Japan.
An essential feature of ASH is that the alignment is generated
from a score matrix defined purely in terms of the structure of the
two proteins. A particular element in the score matrix takes the
form of a Gaussian-shaped function of the inter-residue distance
2
e ij
¼
exp
ðð
d ij
=
d 0
Þ
Þ;
where d ij is the distance between two alpha carbons i and j in the
two input structures and d 0 is a parameter that defines tolerance in
the score. The alignment results are fairly robust with respect to
the particular choice of d 0 , and the default behavior is to set the
parameter to 4 ˚ . The distance between any two residues in
the two input structures is obviously a function of their relative
displacement and orientation. Thus the goal of ASH is to find the
relative orientation that maximizes the equivalences when summed
over the alignment. For domains that are topologically quite simi-
lar, minimization of the root-mean square deviation (RMSD) for a
continuous subsequence of residues can provide a good initial
guess. However, cases of repeating structural motifs can cause
problems with convergence to a unique global maximum.
The residue-level equivalences, which form the basis of all ASH
alignments, provide a convenient route for combining MAFFT and
ASH. Given a set of input structures, we can compute structural
alignments for all unique pairs. We can then set a threshold for the
residue equivalence (e.g., .5), which we will define as “high confi-
dence.” MAFFT allows such “seed” alignments to be input as
restraints [ 57 ].
Search WWH ::




Custom Search