Protein Structure Modeling and Docking at the Swiss Institute of Bioinformatics - Bioinformatics: A Swiss Perspective

Biology Reference

In-Depth Information

of structural similarities that become very limited when the sequence

identity levels drop below a certain threshold, usually around 30%.

Experience shows that, with decreasing identity levels, alignment errors

first appear in loop regions before affecting larger portions of the pro-

teins, and that it is close to impossible to properly align loops of low

sequence identity and unequal lengths, leaving it up to the later model-

building process to find adequate solutions for loop structures.

Furthermore, many protein structures sharing a sequence identity level

below 40% contain structurally nonconserved loops, even if they have the

same length. Therefore, it becomes apparent that even the alignment of

loops with identical length but completely different sequences has little

meaning in structural biology.

4.1.2. Model accuracy

The accuracy of a protein model is largely limited by the deviation of the

used template structure(s) relative to the experimental structure of the

target. This limitation is inherent to the method, since comparative mod-

els result from a structural extrapolation guided by a sequence alignment.

As shown by comparison of the experimentally elucidated structures,

there is a direct correlation between the sequence identity level of a pro-

tein pair and the deviation of the C

atoms of their common core. 36 It is

therefore generally accepted that the percentage of sequence identity

between target and template allows for a reasonable first estimate of the

model quality, and that the core C

α

atoms of protein models sharing 50%

sequence identity with their templates will deviate by approximately 1.0 Å

root mean square deviation (RMSD) from their experimentally eluci-

dated structures 36 ; this is roughly comparable to the accuracy of a

medium-resolution NMR-derived structure or a low-resolution X-ray

structure. 37,38 This has led to the definition of three broad classes of

model quality based on the level of identity of the core region common

to both target and template sequences. Firstly, models based on more

than 50% identity will yield high-accuracy models, where inaccuracies are

mostly restricted to side-chain packing and loop regions. Secondly, com-

parative models based on 30% to 50% sequence identity can be consi-

dered medium-accuracy models, where the most frequent inaccuracies

α

Search WWH ::

Custom Search

Home