Information Technology Reference
In-Depth Information
Protein Structure and its Classification
Andrew J. MILES, Clare E. SANSOM and Bonnie A. WALLACE
School of Crystallography, Birkbeck College, University of London,, London, UK
Abstract. Description of protein structure is based on a hierarchy ofconcepts, from
the peptide bond to secondary structures, motifs and folds. The classification of
protein structures is usually achieved by segregating mainly-alpha, mainly-beta, and
mixed (alpha/beta and alpha+beta) structures. This chapter gives an overview of
structural concepts as well as examples how these are implemented in databases
such as CATH, SCOP and FSSP.
Introduction
There are a vast number of ways to fold a polypeptide chain into a compact structure
however the number of possible folds is limited according to the following thermodynamic
argument [1]: Protein folding is partly driven by the sequestration of hydrophobic
sidechains into the molecule's interior where the backbone polar groups must interact to
prevent hydrogen bonding with the solvent that would push the equilibrium towards the
unfolded state. Thus short stretches of the chain adopt regular conformations called
secondary structure in which internal hydrogen bonding between the backbone amide and
carbonyl groups is optimised. The two main secondary structures,
-sheets,
traverse the molecule from one side to the other where a loop reverses the chain. They pack
together to exclude water from the interior and form common motifs that in turn assemble
into semi-independent globular regions of the protein called domains. By taking the domain
as a fold unit and clustering similar structures at each level (i.e. secondary structures, motifs
and folds) it is possible to create a taxonomy of protein families based on structural
similarities.
The first X-ray crystal structure of a globular protein was reported by in 1958 by
Kendrew [2], and since then thousands of structures have been determined by X-ray
crystallography and nuclear magnetic resonance (NMR). The Protein Data Bank
(http://www.rcsb.org/pdb) contained over 21000 protein structures with >5500 non-
redundant structures in November, 2003 and there are around 3000 additional entries per
year. Classification of such a large number of proteins into structural families can best be
accomplished using automated methods that require unambiguous definitions at each
structural level. This chapter discusses the most common secondary structures, motifs and
folds and their classifications.
α
-helices and
β
1. The Peptide Unit
From the study of amide and dipeptide crystal structures Pauling et al., [3] determined that
the length of the C -N bond (see figure 1) is 10% shorter than normal whereas the C -O
double bond is more than 1% longer than that seen in ketones and aldehydes. This is due to
resonance between the structures shown in figure 2, and corresponds to the C
- N bond
having almost 50% double bond character. Consequently the peptide bond is planar and
Search WWH ::




Custom Search