Information Technology Reference
In-Depth Information
the comparison of distributions, such as the F
2
test and contingency table analysis, etc. [37]
that all yield probability values between 0 and 1.
Probability-based measures are widely used for the evaluation of prediction
methods [32, 33]. Similarity measures for chemical structures have been reviewed by
Willett [31].
2.9 Proximity measures for groups of objects
Proximity measures originally defined to pairs of structural descriptions can be generalized
to groups. Given a single description
S
and a group of descriptions
[A]={A
1
, A
2
, …A
n
)
, a
proximity measure
P(X,Y)
between
S
and
[A]
can be defined using the
P(S,A
i
)
values of the
pairwise comparisons; for example, one can take the minimal, the maximal or the average
of the
P(S,A
i
)
values as the proximity measure between
S
and the group. Another possibility
is to calculate from the descriptions
A
i
a “consensus value”
<A>
, sometimes called the
centroid of
[A]
. If the descriptions are simple numeric values or vectors,
<A>
can be
defined as their average. If
A
i
-s are vectors,
<A>
can be their vectorial average, etc. Then,
the proximity measure between
S
and
A
can be calculated as
P(S,<A>)
.
Proximity measures between two groups of objects
[A]
and
[B]
can be defined in a similar
way: we can take the minimum, maximum or average of the
P(A
i
,B
j
)
proximity measures,
or determine the proximity of the two centroids,
P(<A>,<B>)
.
If a single object is compared to group
[A]
in terms of a feature
f
that is supposed to be
normally distributed in
[A]
, with mean
m
and standard deviation
sd
, then, instead of the
f
m
simple difference
f
we can use a scaled value
m
for calculating a distance
sd
between an object and the group. Similarly, one can calculate a distance between two
groups (denoted by upper indices 1 and 2, respectively) using the values
1
2
m
m
. The resulting distance values will thus incorporate a natural scaling
1
2
2
2
(
sd
)
(
sd
)
based on the different variance of the groups. This scaling can be generalized to cases in
which the objects to be compared are represented as vectors of features
f
1
, f
2
…f
n
characterized by a covariance matrix
C
. In this case, the so-called Mahalanobis distance is
defined as:
[13]
1
2
1
2
MD
(
m
m
)'
C
^
(
m
m
)
where
m
1
and
m
2
are average vectors for group 1 and group 2, respectively,
1
2
(
m
m
)'
is
1
2
the transpose of
m
and
C^
is the inverse of the variance-covariance matrix
C
.
MD
can be viewed as an Euclidean distance scaled by the covariance matrix, the latter being
assumed to be identical for both groups.
(
m
)
3. Matching (alignment)
For two structures to be similar, one has to find a matching in terms of entities and
relationships. Such a matching is shown in
Figure 3
. A matching resembles an analogy. In