Concepts of Similarity in Bioinformatics - Essays in Bioinformatics

Information Technology Reference

In-Depth Information

databases. For example, the atoms are named in PDB files, but the connectivity of atoms in

amino acids is not part of the database, it rather has to be included in the program reading

the database entries. If a description contains only entities or only relationships, we term it

an “ unstructured description ”. Examples include amino acid composition (only entities)

and C D distance-distributions (only relationships).

Finally, descriptors can be classified also depending on what they refer to.

Descriptors referring to an entire molecule are global descriptors, such as a protein

function. Local descriptors, such as the role of a domain within the protein are local

descriptors.

1.4 Overview of macromolecular descriptions

Based on the concepts introduced in the preceding sections we can now attempt to classify

the molecular descriptions. One simple classification distinguishes 1D, 2D and 3D

descriptions. 1D descriptions, such as sequences and hydrophobicity plots, are residue-

based, and include only the chain-topology. 2D descriptions are graph-like and include

relations in addition to the chain topology (e.g. helical circle and helical net diagrams

provide a symbolic view of the 3D arrangements). 3D descriptions are those in which

Cartesian coordinates are included among the descriptors.

A more detailed classification is possible according to the mathematical machinery.

This classification essentially follows that of Johnson set up for small molecules [2, 18].

The most complete description is a generalized labelled graph in which both the

vertices, and the edges can be provided with arbitrary labels such as numbers, vectors,

names even statements in human language. Labels can be attached to individual entities or

to groups of them (such as segments of a polypeptide chain). This is a hypothetical, multi-

level description that is best approximated by a well-annotated 3D database record that is

cross-referenced to (possibly all) the available biological databases. Such variable-level

descriptions are rarely used for comparison. The 3D comparison programs of Sali and

Blundell are one of the few exceptions, they use a hierarchy of levels such as atoms,

residues, secondary structures and domains [19, 20].

3D structures contain atoms and entities provided with Cartesian coordinates as

descriptors, as well a chemical (covalent) connectivity. This description is used by most of

the molecular modelling and structure comparison programs. Structural databases contain

the entities and their labels; the connectivity maps are included with the analysis programs.

Distance matrices. Distances calculated between the elements of the same structure

constitute a distance matrix. In 3D structures, one can use the positional coordinates to

define distance vectors, whereas the number of edges between two nodes can be used to

define a distance in a graph. Both are extensively used in similarity analysis.

Finite sequences. All graphs can be represented in terms of finite sequences. A

protein sequence is a special graph where the residues are the entities and the polypeptide

chain connectivities are the edges. 1D plots (such as the hydrophobicity plot) can be

derived from an amino acid sequence by representing one single numeric parameter as a

function of the residue position. This parameter can be either an experimentally determined

value (such as a physicochemical parameter, or a quantity computed from the sequence or

from the 3D structure.

Surfaces used for proteins include the Van der Waals surface or the electrostatic

surfaces that are computable from the 3D structure. Surface similarity analysis is not

included in this review, an excellent review is in [21-23].

Integrable scalar fields. In this representation the molecule is treated as a spatial

distribution of a single quantity, such as electron density or mass density [24].

Essays in Bioinformatics

Search WWH ::

Custom Search

Home