Information Technology Reference
In-Depth Information
an analogy, features of one object are paired with features of another object. Which features
are paired is often a subjective choice. Matching of molecular structure relies in comparing
molecular descriptions finding an optimal match between the features of the mathematical
representations.
The simplest example of matching is the alignment of short character strings of
equal length described in Figure 6. Another example is to find the exact occurrence of a
short character string query within another, longer string.
With these examples one can illustrate, without formal definition, some of the
important properties of matching used in bioinformatics. Given two descriptions A, B
consisting of i and j elements, respectively, let's define a mapping m : A Æ B that assigns
certain k elements in A to certain l elements in B ; it is not necessary that all elements in A
have an element assigned in B and vice versa. In the simplest case, the mapping is one-to-
one, so certain elements in A will have a pair in B (so k=l ). In other cases, we may get
multiple matching.
Types of alignments [2]. From the philosophical point of view, matching
(alignments) of structural descriptions can arise from three specific sources. i) Due to prior
knowledge, only some parts of the molecules are considered when establishing a matching.
For example, backbone atoms of a protein must match backbone atoms of the other protein,
when the 3-D structures are compared. ii) If we deal with unstructured descriptions, such as
vectors, the elements - the vector components - match by definition, which is called
canonical matching; iii) Finally, we might be interested in the maximal matching of two
larger structures given in the form of structured descriptions (consisting of both entities and
relationships), and then we need an optimizeable similarity measure such as in equation 5,
in order to find a maximal alignment. The number of all possible alignments is very high,
so finding the optimum is often very compute-intensive and sometimes intractable problem.
Algorithmic solutions From the algorithmic point of view, the methods can be
subdivided a) according to the structure types (character strings, graphs, 3D structures), b)
according to the nature of the matches that are being sought (exact matching, approximate
matching), or according to number of partners compared (pairwise alignments or multiple
alignments).
The majority of algorithms can cope only with the simplest descriptions, character
sequences. Finding exact matches between two sequences of n and m characters
respectively, has a complexity of O(n,m) . Comparison of graphs is much more difficult,
here the majority of the problems are NP complete, i.e. computationally not tractable.
Identity of graphs is determined by graph isomorphism algorithms, and similarity of graphs
(such as protein structures, metabolic pathways) is a subgraph isomorphism problem, which
is more difficult, and is aggravated by the fact that the structures in question are labelled
graphs. Rigorous comparison of complex descriptions such as 3D structures is np-hard.
Time raqirements can be decreased if we use unstructured descriptions that do not
need an alignment for comparison. These descriptions are simpler, and perform well only if
one can find an adequate resolution, such as a multidimensional vector. The calculations are
fast, but there is no guarantee that the results will be similar to those produced by alignment
methods. On the other hand, such methods are useful as preliminary filters used to screen
large databases.
Heuristic solutions provide the second general avenue for diminishing computer
times. Most of the alignment methods use some kind of heuristics. There are two important
heuristics that are used to simplify the process of alignment, the principle of linearity and
the use of higher order descriptions.
The principle of linearity is based on the chain-like topology of protein and DNA
molecules. All biologically important alignments contain short stretches that are very
similar or identical between the two molecules. So instead of testing all possible
Search WWH ::




Custom Search