Information Technology Reference
In-Depth Information
alpha
1
beta
2
alpha & beta
3
C
Roll
2,30
Sandwich
2,60
Trefoil
2,80
A
Ig-Like
2,60,40
Bence Jones protein
(1rei)
Jelly roll
2,60,120
C-reactive protein
(1b09)
T
Figure 14. Diagram depicting the hierarchical nature of CATH for the three main
classes.
4.3 The CATH Database
CATH is an acronym for the four main levels in the database hierarchy: Class (C),
Architecture (A), Topology (T) and Homologous superfamily (H) . There is also a fifth
level, Sequence family (S). Classification is carried out using sequence alignment methods,
the structure comparison algorithm SSAP [46] and human intervention where the automatic
processes fail. An entry is assigned a number that correlates to its classification at each
level. CATH is accessible on line at http://www.biochem.ucl.ac.uk/bsm/cath/. Users may
search by the PDB code, a CATH number or text. The hierarchy is described below and
illustrated in figure 14 and the number of entries at each level in September 2003 is shown
in table 3.
Sequence family: Proteins with 35% sequence identity or greater are clustered at
this level.
Homologous superfamily; Equivalent to the SCOP superfamily where structures
are grouped by their functional and structural similarity.
Topology: Similar to the SCOP common fold. Proteins with the same CAT number
have the same class, architecture and topology but do not necessarily belong to the
same homologous superfamily.
Architecture: This level clusters proteins within the same class by their general
shape irrespective of connectivity.
Class: CATH has four classes, mainly-
and irregular, the latter
containing proteins with low secondary structure content. The
α
, mainly-
β
,
α
-
β
α
/
β
and
α
+
β
classes
are distinguishedat the topology level rather than the class level.
4.4 Comparison of SCOP, CATH and Dali
In 2003 an analysis of the three databases [54] found more agreement between the domain
definitions of SCOP and CATH than between Dali and either SCOP or CATH. Domain
mismatches can occur when part of a protein is excluded from the definition in one
database but not in the other. For example in CATH both the N and C terminal domains of
MHC class II chains (1iea (A-D)) are classified as one domain whereas SCOP only
includes the N-terminus which means that any structure matching the C-terminus will be
included in CATH but will not have an equivalent match in SCOP
Search WWH ::




Custom Search