Information Technology Reference
In-Depth Information
In general, 3D shape comparison software requires: i) A representation of the
molecules, usually the xyz coordinates of all the C α atoms. ii) An objective function, for
example, rotate and translate one molecule relative to the other and measure inter-
molecular distances between equivalent points on the two chains using iii) a comparison
algorithm that requires decision rules derived from statistical analysis of multiple samples.
When classifying 3D shape, redundancy is removed by making all sequences that have
>25% identity equal and choosing the domain as a fold unit [49]. The advent of this
technology has led to the compilation of a number of structural databases such as FSSP
[50], SCOP [51] and CATH [52], which are accessible on the World Wide Web.
4.1 FSSP
Fold classification based on Structure-Structure alignment of Proteins (FSSP) uses the fully
automated structure comparison algorithm, DALI (Distance ALIgnment algorithm) to
calculate a pair-wise structural similarity value between protein chains (S-score). The S
scores for all pairs of proteins are evaluated and given statistically meaningful Z scores.
Protein pairs with comparable scores are considered to have similar folds and a hierarchical
structure, the Dali Domain Dictionary [53] has been created which allows direct
comparison with SCOP and CATH. FSSP is accessible on the World Wide Web at
http://www.bioinfo.biocenter.helsinki.fi:8080/dali/index.html
4.2 The SCOP Database
The Structural Classification of Proteins (SCOP) database provides a detailed description of
the structural and evolutionary relationships of proteins of known structure and is
accessible on the World Wide Web at http:// scop.mrc-lmb.cam.ac.uk/scop/. There are two
search facilities. One allows the user to enter a sequence to obtain a list of structures with
significant sequence homology the other allows the user to enter a keyword to match text in
the SCOP database and headers in the Protein Databank (PDB).
SCOP protein classification is a mainly manual process using visual inspection to
compare structures but it also employs sequence homology and a variety of automated
procedures. The unit of classification is the domain, each being treated separately in multi
domain proteins. The hierarchy is described below and the number of entries at each level
as of November 2003 is shown in table 2.
Family: Proteins with 30% sequence identity or greater or those with less sequence
homology but very similar structures and functions are clustered into families
Superfamily: Superfamilies contain families that have low sequence homology but
in which an evolutionary origin is suggested by structural and functional
similarities.
Fold: Proteins which have their
-sheets in the same topological
order and architectural arrangement are defined as having a common fold.
α
-helices and
β
Class: there are seven classes, the four mentioned above ( α , β , α / β and α + β ), small
proteins, multidomain for folds consisting of two or more domains belonging to
different classes, and membrane proteins.
Search WWH ::




Custom Search