Information Technology Reference
In-Depth Information
databases. For example, the atoms are named in PDB files, but the connectivity of atoms in
amino acids is not part of the database, it rather has to be included in the program reading
the database entries. If a description contains only entities or only relationships, we term it
an “ unstructured description ”. Examples include amino acid composition (only entities)
and C D distance-distributions (only relationships).
Finally, descriptors can be classified also depending on what they refer to.
Descriptors referring to an entire molecule are global descriptors, such as a protein
function. Local descriptors, such as the role of a domain within the protein are local
descriptors.
1.4 Overview of macromolecular descriptions
Based on the concepts introduced in the preceding sections we can now attempt to classify
the molecular descriptions. One simple classification distinguishes 1D, 2D and 3D
descriptions. 1D descriptions, such as sequences and hydrophobicity plots, are residue-
based, and include only the chain-topology. 2D descriptions are graph-like and include
relations in addition to the chain topology (e.g. helical circle and helical net diagrams
provide a symbolic view of the 3D arrangements). 3D descriptions are those in which
Cartesian coordinates are included among the descriptors.
A more detailed classification is possible according to the mathematical machinery.
This classification essentially follows that of Johnson set up for small molecules [2, 18].
The most complete description is a generalized labelled graph in which both the
vertices, and the edges can be provided with arbitrary labels such as numbers, vectors,
names even statements in human language. Labels can be attached to individual entities or
to groups of them (such as segments of a polypeptide chain). This is a hypothetical, multi-
level description that is best approximated by a well-annotated 3D database record that is
cross-referenced to (possibly all) the available biological databases. Such variable-level
descriptions are rarely used for comparison. The 3D comparison programs of Sali and
Blundell are one of the few exceptions, they use a hierarchy of levels such as atoms,
residues, secondary structures and domains [19, 20].
3D structures contain atoms and entities provided with Cartesian coordinates as
descriptors, as well a chemical (covalent) connectivity. This description is used by most of
the molecular modelling and structure comparison programs. Structural databases contain
the entities and their labels; the connectivity maps are included with the analysis programs.
Distance matrices. Distances calculated between the elements of the same structure
constitute a distance matrix. In 3D structures, one can use the positional coordinates to
define distance vectors, whereas the number of edges between two nodes can be used to
define a distance in a graph. Both are extensively used in similarity analysis.
Finite sequences. All graphs can be represented in terms of finite sequences. A
protein sequence is a special graph where the residues are the entities and the polypeptide
chain connectivities are the edges. 1D plots (such as the hydrophobicity plot) can be
derived from an amino acid sequence by representing one single numeric parameter as a
function of the residue position. This parameter can be either an experimentally determined
value (such as a physicochemical parameter, or a quantity computed from the sequence or
from the 3D structure.
Surfaces used for proteins include the Van der Waals surface or the electrostatic
surfaces that are computable from the 3D structure. Surface similarity analysis is not
included in this review, an excellent review is in [21-23].
Integrable scalar fields. In this representation the molecule is treated as a spatial
distribution of a single quantity, such as electron density or mass density [24].
Search WWH ::




Custom Search