Biomedical Engineering Reference
In-Depth Information
Fig. 4 Construction of
structural-profile databases.
The scheme illustrates the
steps involved in constructing
structural-profile databases
starting from the structures
in PDB
threading program has its own optimized, intricate protocol and scoring system
to identify structural templates, we only discuss the general principles underlying
these programs rather than the scoring functions of specific programs. There are two
groups of data available to the threading programs to generate an optimal alignment:
data on the query sequence and data on all possible structural templates. Data on
the query side consist of (Fig. 3 ) (1) the sequence-profile of the query sequence
generated either using PSI-BLAST alone or PSI-BLAST and HMM programs and
(2) the secondary structure propensity of each position of the query sequence,
which can be determined using neural network or HMM-based programs such as
PSIPRED [ 29 ]orJpred[ 30 ]. Data on the template side are significantly richer. First,
all known structures can be grouped into structural families based on structural
similarity and a sequence alignment can be performed for sequences in each of
these structural families. The sequence alignment, which is primarily based on the
structural alignment, gives rise to residue propensities in each position of the fold,
which we can denote as the structure-profile (Fig. 4 ). Second, one can obtain the
secondary structure at each position of the fold using the dictionary of protein
secondary structure(DSSP) program [ 36 ]. Third, one can obtain the environment
of each position of the fold—whether it is buried or exposed, whether the backbone
or side-chain are involved in any hydrogen bonds (Fig. 4 ). Fourth, distance or cut-
off based residue-residue contact probability can be obtained in each structural
family. These four pieces of information are used in a combinatorial fashion by
different programs to match the two pieces of information available for the query
sequence. Thus, each program uses a combination of terms that are optimally
weighted to arrive at a final score that reflects the goodness of fit between a
query sequence and a template structure (or a structural family, depending on the
program). One way to align structure to sequence can be to match the structure-
profile of the template (amino acid propensities in each position of the fold) to the
Search WWH ::




Custom Search