Information Technology Reference
In-Depth Information
advanced bioinformatics courses. Section 2 describes the basic concepts used in
macromolecular similarity analysis, pointing out, whenever possible, the parallel concepts
in other fields. Section 3 focuses on four distinct mathematical relationships, each of which
constitutes a possible definition of similarity: equivalence, matching, partial ordering, and
proximity.
1. Basic concepts
1.1 Model, description, analysis
When we speak about molecules, what we mean are not physical entities, rather abstract
models of reality. It is useful to distinguish three concepts underlying molecular data:
The models are the conceptual structures or mental representations used to store
information on molecules. These models never incorporate all of information available on a
given macromolecule - the mere listing of the atoms and bonds in a macromolecule would
be beyond the reach of human memory - rather we deal with a set of models of varying
complexity, each describing a certain aspect of the molecular structure, such as linear
sequence, domain topology, active site contacts, etc.
Various formal and/or narrative descriptions of the data constitute the backbone of
molecular databases. We can imagine the descriptions as the mathematical representation of
a particular model. Similarity measures are calculated between descriptions (and not
between models).
The analysis covers everything we do with molecular data in such fields as
molecular modelling, prediction, classification, similarity search, visualization etc.
For example we may start noticing a new regularity when classifying the existing molecular
descriptions ( analysis ). If this new feature “makes sense” (e.g. it points to a meaningful
subclass of the objects) we may include this into our abstract model , and we may proceed to
construct a new kind of description that includes the new feature. In a further round of
analysis we may find new examples that contain the feature in question, in addition we may
experiment with new feature candidates analogous or similar to the previously found
features. As this cycle is repeated, the models and the descriptions undergo an evolutionary
change, and in fact this is how databases develop [10].
1.2 Entities, relationships, structure and function
In the first approximation, bioinformatics is concerned with the structure of protein and
DNA molecules that fulfil functions in a series of interdependent systems such as pathways,
cells, tissues, organs and organisms. This complex scenario can be best described with the
concepts of systems theory ( Figure 1 ).
According to systems theory [11, 12], a system is a group of interacting elements
functioning as a whole and distinguishable from its environment by recognizable
boundaries Molecules can be regarded as such systems. Generally speaking, structure is
fixed state of a system, and the study of a system usually starts with its characteristic
structures that are recurrent in space or time. As structures are detected by recurrence, the
symmetries (internal repetitions) are integral parts of structural descriptions. Using the
terms of the previous paragraphs, systems are conceptual models of reality, while structures
are descriptions.
Search WWH ::




Custom Search