Biology Reference
In-Depth Information
sequence has an associated 3D structure deposited in the PDB
[ 32 ], then the secondary structure elements of this protein are
assigned using DSSP [ 33 ] and so do not need to be predicted. In
PRALINE, the DSSP information is found based on the (FASTA)
sequence definition line. 1
After the secondary structure delineation step, PRALINE
applies its secondary structure scoring scheme, which is a soft
scheme to align the secondary structure elements by using
observed residue mutation probabilities as observed in alpha
helix, beta strand, or coil conformations (Fig. 3 ). Residue positions
with identical secondary structure assignments are scored using
L
uthy helix-, strand-, and coil-specific matrices [ 34 ], while residue
positions with nonidentical secondary structure assignments are
scored using the generic scoring matrix (e.g., BLOSUM62 [ 12 ]).
Since specific exchange values are used to discriminate the matching
of the secondary structures, different structures can become
matched (e.g., a helix with a coil structure). This means that
the method can reasonably deal with errors in the annotation of
secondary structure elements.
The TM regions of membrane-bound proteins show a different
hydrophobicity pattern compared to globular soluble proteins
[ 35 ]. This is because they are immersed in a largely hydrophobic
environment as opposed to the more hydrophilic nature of the
cytosol. Conventional scoring matrices which are tailored for solu-
ble proteins are therefore not optimally suited for aligning
membrane-bound proteins.
PRALINE is not the first alignment method that combines
information from different substitution matrices in order to
improve the quality of TM protein alignment. One of the earlier
methods that attempted a similar approach was STAM [ 36 ].
However, this method incorporates the TM information in a
“hard” way. First, TM regions are aligned separately, thereby
anchoring the alignment, after which the intervening stretches are
aligned. This means that the method is crucially dependent upon
the quality of the annotation of TM regions, and would have
difficulty, for example, in aligning 7-TM sequences if for some of
the sequences less than seven TM regions would have been pre-
dicted.
In PRALINE, TM information is taken into account in a more
flexible way, which consists of three steps [ 37 ]. Firstly, the TM
topology for each input sequence is predicted using a TM predic-
tion tool. The user can select one out of
2.5 Transmembrane-
Aware Protein
Alignment
three predictors:
1 PRALINE finds the PDB identifier of a protein by extracting it from the fasta definition line of that protein. For
example, these description lines are fine: “
102LA”. For any other description
line, PDB identifier is not extracted. No description may follow the sequence identifier. Thus “
>
102L_A,” “
>
102L|A,” and “
>
>
pdb|102L|A”,
>
gi|157829524|pdb|102L|A”, and also “
>
102L_A ” (note the trailing space) are skipped.
Search WWH ::




Custom Search