Information Technology Reference
In-Depth Information
Transforms. There are various methods to calculate topological transforms from
graphs. Fourier transforms of 1D sequence plots have been used to identify amphifilic
regions in proteins, as well as to compare proteins.
Finite sets are unstructured descriptors that can be obtained e.g. by omitting all
relationships from a labeled graph. The resulting set of entities provides description that can
be ordered according to kinds. A typical example is the amino acid composition, or other
fragment-composition type descriptions (dipeptide, tripeptide etc. compositions). This is a
vector-representation, the parameters of the vector corresponds to the number of times a
certain entity is present in a structure A subcase of finite set descriptions consists in
reducing the set of entities to a set (list) of kinds. This can be achieved by omitting the
numbers from a compositional description.
Distributions. A vector consisting of nonnegative numbers that sum to unity
constitutes a parameter vector of a multinomial distribution. A typical example is the amino
acid composition expressed in percentages, or the distribution of inter-atomic distances
within a protein structure, or distribution of connectivity degrees in large networks.
Vectors, product spaces. In addition to the special vectors mentioned in 5 and 6,
arbitrary parameters of a given molecules can be assembled into vectorial descriptions.
Such complex descriptions are used as input in machine-learning, and are also often used in
general pattern-recognition applications.
Real numbers (molecular sizes, molecular weight etc.) are perhaps the simplest
descriptors of molecules.
2. Mathematical concepts related to similarity
2.1 Relations
2.1.1 Equivalence
Equivalence relations (denoted here by “#”) are related to the commonly used term of
identity. Strictly speaking, a molecule can only be identical with itself; here we are
concerned with the cases when two molecules have identical mathematical descriptions,
which does not mean that they are identical. For example, two proteins that have an
identical description in terms of amino acid sequence may undergo phosphorylation or
other posttranslational modifications at different sequence positions).
Equivalence relations in mathematics are defined by three properties: reflexivity,
symmetry, and transitivity. A relation is reflexive if A # A for all molecular descriptions A.
It is symmetric if A # B implies B # A. It is transitive if A # B, and B # C implies A # C.
Let [A] denote the family of those molecules equivalent to A with respect to #. If B denotes
some other molecule, it can be proven mathematically that either [A] and [B] denote the
same set of molecules or the two sets have no members in common. The set [A] is called an
equivalence class. For example two proteins are considered identical if and only if their
(amino-acid) sequences are the same. It is noted that “identity” refers to a given description;
in this example the potential differences in post-translational modifications are disregarded.
2.1.2 Partial ordering
Partial ordering relations are related to the commonly used terms “to be a substructure of”,
“to be a part of”. A relation d is called a partial order if it is reflexive, antisymmetric, and
transitive. The reflexive and transitive properties of a relation were defined earlier. A
Search WWH ::




Custom Search