Biology Reference
In-Depth Information
are found in side-chain packing, slight distortions of the protein core,
and inaccurate loop conformations. Thirdly, models based on templates
whose cores share less than 30% identity with the template lead to
low-accuracy models, where inaccuracies become more important and
distortions more severe.
Sequence identity is, however, not the only way templates influence
protein model accuracy, and one should not overlook other possible con-
tributions in the course of a modeling process. Indeed, the templates,
which are obtained through experimental approaches, are subject to
structural variations caused not only by experimental errors and differ-
ences in data collection conditions (e.g. temperature 39 ), but also by dif-
ferent crystal lattice contacts and the presence or absence of ligands. 40
A direct consequence of the comparative approach is that these influences
are carried over to the models derived from these templates, and call for
an increased attention to the template selection process and a good
understanding of the factors which influenced their experimental struc-
ture elucidation.
4.2. Limitations of Comparative Protein Modeling
4.2.1. Template availability and structural diversity
It is generally accepted that a very small number of different folds
account for the majority of known structures, 41 and a recent study has
argued that most sequences could already be modeled using known folds
(or fragments of known folds) as templates. 42 Thus, for a large propor-
tion of protein domains, a structure with a similar fold would be available
in the PDB. However, models based on alignments with low sequence
identity generally provide accurate information only about the overall
fold of the protein. As the correctness and accuracy of comparative mod-
els rapidly decrease when the sequence identity between a target and
template drops below 30%-35%, a much denser coverage of the sequence
space with experimentally elucidated protein structures is necessary to
create adequate protein models for the majority of domains. Ideally, one
should find a template in the PDB with 30%-35% (or more) sequence
identity for every target. It is, however, important to remember that
Search WWH ::




Custom Search