MAFFT: Iterative Refinement and Additional Methods - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

complexity of protein structure is reflected in the algorithms used to

align them. Structural alignment methods are generally slow com-

pared with sequence alignment methods, so any effort to combine

the two must weigh the costs of such integration against the benefits.

There are also conceptual questions that need to be addressed.

The biggest one is how to incorporate structural information into

MSA calculations. A structural alignment is generally at least as

accurate as a sequence alignment. However, not all parts of the

alignment are equally reliable. For example, as a general rule “core”

residues will align better than residues close to the molecular sur-

face. When importing structural alignments into MSA calculations,

we need a way of describing such variations in alignment quality.

Below we will describe a particular structural alignment package,

ASH, and discuss MAFFT-ASH integration at a conceptual level.

ASH [ 53 ] is a pairwise protein structural alignment program that is

based on the double dynamic programming (DDP) algorithm

originally proposed by Orengo and Taylor [ 54 , 55 ] and extended

by Toh [ 56 ]. The source code of ASH is available from the Protein

Data Bank Japan.

An essential feature of ASH is that the alignment is generated

from a score matrix defined purely in terms of the structure of the

two proteins. A particular element in the score matrix takes the

form of a Gaussian-shaped function of the inter-residue distance

2

e ij

¼

exp

ðð

d ij

=

d 0

Þ

Þ;

where d ij is the distance between two alpha carbons i and j in the

two input structures and d 0 is a parameter that defines tolerance in

the score. The alignment results are fairly robust with respect to

the particular choice of d 0 , and the default behavior is to set the

parameter to 4 ˚ . The distance between any two residues in

the two input structures is obviously a function of their relative

displacement and orientation. Thus the goal of ASH is to find the

relative orientation that maximizes the equivalences when summed

over the alignment. For domains that are topologically quite simi-

lar, minimization of the root-mean square deviation (RMSD) for a

continuous subsequence of residues can provide a good initial

guess. However, cases of repeating structural motifs can cause

problems with convergence to a unique global maximum.

The residue-level equivalences, which form the basis of all ASH

alignments, provide a convenient route for combining MAFFT and

ASH. Given a set of input structures, we can compute structural

alignments for all unique pairs. We can then set a threshold for the

residue equivalence (e.g., .5), which we will define as “high confi-

dence.” MAFFT allows such “seed” alignments to be input as

restraints [ 57 ].

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home