Biomedical Engineering Reference
In-Depth Information
is required. The extent of coverage determines the number of residues that need
to be modeled without prior knowledge of their backbone coordinates. There are
exceptions for the lower bounds of both similarity and query coverage, which will
be discussed under remote homology, but if one were to choose a template based
on BLAST results alone, the lower bounds for similarity and coverage are to be
followed strictly to obtain unambiguous structural models. The E-value provides
the statistical significance of a “hit” and describes the number of hits that can be
obtained by chance in a given database with a given score. Thus, lower the E-
value, greater the significance of a given hit. Generally, E-values less than 0.01 are
considered significant for generating homology models.
If no homologs in the PDB are detected using BLAST for a given sequence, the
alternative strategy is to use position-specific iterated-BLAST (PSI-BLAST) [ 23 ].
PSI-BLAST constructs a position-specific scoring matrix (PSSM) using the multiple
sequence alignment of BLAST hits detected above a certain threshold (based on
E-value). The PSSM is then used for searching the database. The construction
of the PSSM and the subsequent database search are performed iteratively for
several rounds until no new sequences are found. By using information from all the
BLAST hits of a given iteration, PSI-BLAST helps uncover distant homologs. In
determining the optimal template using PSI-BLAST, one uses the same thresholds
for sequence coverage, similarity, and E-value that were discussed for BLAST.
Once a suitable template is identified, it is worthwhile to closely analyze
the sequence alignment between the query and template. Analysis from several
rounds of critical assessment of structure prediction (CASP) [ 24 ]hasshownthat
the sequence alignment between query and template is the most important step
in comparative modeling. The most prominent inaccuracies in homology models
arise from inaccurate sequence alignment rather than errors in subsequent steps
of structure building. Significantly, BLAST scoring matrices and PSSMs may not
incorporate subtle structural details pertinent to the given protein like the positioning
of structurally important cysteine disulphide bridges, proline residues, residues
important in protein function, etc. In cases where the positioning of these residues
is known to be important based on experimental data, one should manually edit the
alignment to ensure that these residue positions are preserved between the query
and template. Thus, one should consider all available functional, biochemical, and
structural data of all possible residues in the query sequence while scrutinizing and
updating the sequence alignment between the query and the template.
2.3
Remote Homology
If a template is not detectable with BLAST or PSI-BLAST, one needs to use
programs that are capable of identifying distant evolutionary relationships. It has
been shown that two proteins can share a high degree of structural similarity
in spite of the lack of detectable sequence similarity [ 6 ]. The lack of sequence
similarity in these cases highlights high divergence of the sequences and also
Search WWH ::




Custom Search