Information Technology Reference
In-Depth Information
relation is antisymmetric if A d B and B d A implies that A and B are identical. For
example if A d B means that A is a subsequence of B, then d is a partial ordering relation.
A
B
A
A
B
B
C
C
Figure 5. Similarity of molecules can be considered either a tolerance relationship
(A), or an equivalence relationship (B) depending on whether or not the basis of
similarity - the shared substructure - is fixed.
2.1.3 Tolerance, general and specific similarity
Tolerance relations denote the common sense situation in which two things have a common
part or feature, or two structures share a common substructure. A relation ~ is called a
tolerance if it is reflexive, symmetrical, but - in contrast to equivalence relations - not
necessarily transitive. In other words, A~A, A~B implies B~A. Tolerance comes closest to
the common sense concept of similarity, however there is an important distinction to be
made. Based on the psychological concept of Goldmeier [15, 16], we can call two
structures similar if they share some common substructure (see Figure 3 , above). This
general similarity is not transitive, as shown in Figure 5a , it is in fact a tolerance
relationship. On the contrary, we may use the term specific similarity, if two structures
share a well-defined substructure (feature). Fixing the shared substructure renders the
relationship transitive, so specific similarity is an equivalence relationship ( Figure 5b ).
If biological sequences are found similar to each other by BLAST, this is a general
similarity, i.e. it is not necessarily true that all of them share a subsequence, such as a
protein domain. However, those sequences that turn out to share a common subsequence
form an equivalence class. It is noted that a “common subsequence” is often defined in an
empirical way: biologists usually decide based on their prior knowledge whether or not a
subsequence of a protein is a true member of a domain group (like EGF domains), and once
a positive decision is made, the protein sequence is accepted as a member of the
equivalence class of EGF-containing proteins. We might say that evaluation of BLAST
searches consists in distinguishing general and specific similarity.
The use of relations in chemical structure analysis is reviewed in [2, 18].
2.2 Proximity measures
Proximity measures ( PM ) are numeric measures designed to characterize similarity or
dissimilarity of two molecular descriptions. Two general types of proximity measures are in
use. Similarity measures are high for similar molecules and low for dissimilar ones. The
distance measures, on the other hand are zero for identical molecular descriptions and high
for dissimilar ones. In the foregoing we will use proximity measures, distance measures and
similarity measures.
Proximity measures can be used in vastly different contexts, and it is useful to
define two situations that are common in bioinformatics applications. A) Simple proximity
Search WWH ::




Custom Search