Biology Reference
In-Depth Information
used a progressive method with the sum-of-pairs BLOSUM62
[ 29 ] scores to align sequences within each group. Such an
approach does not perform as well as some recent alignment
methods. In the later development of PROMALS3D, we used
MAFFT (options: -maxiterate 1000 -localpair) to perform
alignment within each group to obtain better alignment quality
for each pre-aligned group.
1. The core steps of the PROMALS3D method use advanced
techniques to align the relatively divergent pre-aligned groups
with additional information from sequence and structure data-
bases. First, a representative sequence is selected from each pre-
aligned group, giving rise to N 2 representatives. Instead of
using the longest sequence as the representative as in our
original PROMALS method, we select the representative
sequence that has the highest average similarity to other
sequences in the same pre-aligned group.
2. Each representative sequence is subject to PSI-BLAST [ 30 ]
iterations against the UniRef90 database [ 31 ] to retrieve
sequence homologs. The sequence profile of PSI-BLAST searches
is used to predict secondary structures by PSIPRED [ 32 ].
3. For each pair of representative sequences, we used a probabilis-
tic model to obtain posterior profile-profile alignment prob-
abilities for each position pair via the forward-backward
algorithm. Strictly speaking, our probabilistic model for
profile-profile comparison is not a hidden Markov model
(HMM) as originally proposed [ 19 ], but a Conditional
Random Field (CRF) [ 33 ], since we allowed observation-
dependent transitions between hidden states. In our model,
the transition probabilities depend on predicted secondary
structures, which are used as a type of observations. Like that
in HMMs, the forward-backward algorithm is applicable to
CRFs to obtain posterior alignment probabilities, which serve
as profile-derived alignment constraints.
4. PSI-BLAST profile is used to search a sequence database with
known structures to retrieve homologs with 3D structures
(homolog3Ds). Multiple homolog3Ds could be identified
and used for one representative sequence, e.g., if it contains
several distinct domains with known spatial structures.
Structure-derived alignment constraints for two representative
sequences are deduced from profile-based representative-
to-homolog3D alignments and structure-based homolog3D-to-
homolog3D alignments [ 23 ].
5. Profile-derived alignment constraints and structure-derived
alignment
2.3 Aligning
Pre-aligned Groups
Enhanced with
Evolutionary and
Structural Information
constraints
are
combined for
all pairs of
Search WWH ::




Custom Search