Biology Reference
In-Depth Information
3. Large-Scale Protein Structure Prediction
and Structural Genomics
Comparative protein structure modeling and experimental protein struc-
ture determination complement each other, with the long-term goal of
making 3D atomic-level information of most proteins obtainable from
their corresponding amino acid sequences. Structural genomics is
a worldwide effort, aiming at rapidly determining a large number of pro-
tein structures using X-ray crystallography and NMR spectroscopy in a
high-throughput mode. 27-29 As a result of concerted efforts in techno-
logy and methodology development in recent years, each step of experi-
mental structure determination has become more efficient, less expensive,
and more likely to succeed. 30 Structural genomics initiatives are making
a significant contribution to both the scope and depth of our structural
knowledge about protein families. Although worldwide structural
genomics initiatives account for only
20% of the new structures, these
contribute approximately three quarters of new structurally characterized
families and over five times as many novel folds as classical structural
biology. 31-35
In light of the ever-growing amount of genome sequencing data, the
structure of most proteins, even with structural genomics, will be mod-
eled and not elucidated experimentally. From a modeling-centric pers-
pective, the selection of structural genomics targets should be such that
most of the remaining sequences can be modeled with useful accuracy by
comparative modeling. The accuracy of comparative models currently
declines sharply below 30% sequence identity. Thus, template selection
strategies should aim at systematic sampling of protein structures to
ensure that most of the remaining sequences are related to at least one
experimentally elucidated structure at more than 30% sequence identity;
using this cut-off, it has been estimated that a minimum of 16 000 tar-
gets must be determined to cover 90% of all protein domain families,
including those of membrane proteins. 29 Such estimates show large vari-
ations, depending on the level of sequence identity that is assumed to
ensure sufficiently accurate model building and on how this coverage is
calculated.
Search WWH ::




Custom Search