Biomedical Engineering Reference
In-Depth Information
The homology modeling procedure for a query protein consists of four steps: (i)
identification of at least one homologous protein for which the 3D structure is avail-
able (the template), (ii) alignment of the query sequence to the template sequence,
(iii) construction of the model from the template structure, guided by the sequence
alignment and (iv) assessment and refinement of the model [ 44 ]. Different software
exist that allow the use of the known 3D structure of a protein to build a model of
the 3D structure of a homologous protein, e.g., MODELER [ 45 ] to name but one.
The accuracy of the resulting model strongly depends on the second step that is, in
turn, conditioned by the evolutionary distance between the homologous proteins
[ 46 ]. This evolutionary distance can be approximately estimated by the percentage
of sequence identity (%seqID) between the query and template sequences. Struc-
ture similarity decreases in a non-linear fashion with decreasing %seqIDs. Homolo-
gous proteins with %seqIDs above 50 % have very similar 3D structures whereas
only the “core” of the homologous proteins, i.e., a set of structural features (usually
secondary structure elements), is conserved in all members of the protein family,
below a %seqID of 25%. In this so-called “twilight zone”, homologous protein
sequences harbor many insertions/deletions that make their alignment unreliable
with adverse consequences on the model accuracy; molecular modeling techniques
cannot recover from an alignment error, resulting in a model that is irremediably
wrong. For instance, a shift of two residues in the alignment of amino acids belong-
ing to TMHs will locate those that ought to be in the lumen of the binding pocket
and thus potentially in a position to interact with the ligand, in the opposite side of
the helix, facing the membrane lipids.
An additional challenge arises, due to the modeling of loops, which are the most
variable regions of proteins. Some loops are important for the function of GPCRs,
for instance ECL2 is known to interact with ligands. For homologous proteins close
to the twilight zone, their structural conformations are usually not conserved, as
it is illustrated by the known 3D structures of GPCRs. Therefore, one cannot use
homology modeling techniques for loops and must resort to knowledge-based or
de novo modeling. In the first approach, one screens protein structural databases
for loops having the same length as the loop to be modeled and a suitable distance
between their N and C-terminus to allow for an easy connection of both ends of the
modeled loop on the model backbone. This works well for loops up to 8 residues; on
average 1.35 Å RMSD for 8-residues loops [ 47 ]. In the second approach, the model
is described by a detailed physical energy function and the conformational space
of the loop to be modeled is sampled, as exhaustively as possible, with algorithms
such as Monte Carlo or Molecular Dynamics simulations. These techniques allow
modelers to reproduce the conformation of loops up to 12 residues with an RMSD
of 2 Å, on average. Accurately modeling longer loops remains a difficult task [ 48 ].
This is clearly a problem for ECL2 that can be up to 30-residue long in some GP-
CRs; this is unfortunately the case for ORs.
Once a model is obtained, it can be used as a platform for computer-aided ligand
discovery. This is done by virtual screening techniques, which is commonly used
by pharmaceutical companies worldwide to screen millions of chemical compounds
in silico , in order to rank and select a limited number of compounds for further
experimental tests.
Search WWH ::




Custom Search