Biology Reference
In-Depth Information
Moreover, the last line in Table 1 shows results of profile
alignment, in which the existing alignment is converted to a profile
and each new sequence is separately aligned to the profile, equiva-
lently to mafft-profile . This result clearly indicates that the
application of profile alignment must be avoided in this case.
7
Portability
MAFFT is developed in a UNIX-like environment. Thus it runs
natively on Linux and Mac OS X. However, previously, it did not
smoothly run on Windows. We are now providing an all-in-one
package, which includes SH and other necessary GNU utilities, for
Windows. It runs almost like a native Windows program and can
also be bundled with other packages or programs.
8 Use of Structural Information
We have been discussing alignments in terms of nucleotide or
amino acid sequences. However, many amino acid sequences fold
into unique tertiary structures. The use of such information inMSA
construction was the basis of the 3DCoffee program [ 51 ], and
subsequently PROMALS3D [ 52 ]. In this section we address sev-
eral issues arising when incorporating protein structural informa-
tion in MSA calculations. At the time of this writing, the number of
sequenced proteins far exceeds the number of known structures. It
would appear, then, that the scope of problems that can be
addressed by sequence alignment far exceeds that of structure
alignment. On the other hand, the number of sequence super-
families is limited, and a large number of superfamilies contain
members whose structures have been solved. Structural alignment
represents a logical next step towards quantifying the similarity
between remotely homologous families within a superfamily. How-
ever, to make practical use of sequence and structural information, a
number of obstacles have to be overcome. Some of the obstacles are
technical and result from the complexity and noisiness of structural
information. While sequence information is discrete (i.e., 20 com-
mon amino acids) and compact (can be represented by a single
letter), structural information is continuous (e.g., the position of a
particular atom in space) and relatively large (there are between 4
and 13 heavy atoms in the 20 common amino acids). Moreover,
due to the dynamic nature of proteins and limitations in
experimental techniques, it is not uncommon for some atomic
positions to be undefined or to have ambiguous positional
assignments in typical protein structure database entries. The
Search WWH ::




Custom Search