Information Technology Reference
In-Depth Information
the comparison of distributions, such as the F 2 test and contingency table analysis, etc. [37]
that all yield probability values between 0 and 1.
Probability-based measures are widely used for the evaluation of prediction
methods [32, 33]. Similarity measures for chemical structures have been reviewed by
Willett [31].
2.9 Proximity measures for groups of objects
Proximity measures originally defined to pairs of structural descriptions can be generalized
to groups. Given a single description S and a group of descriptions [A]={A 1 , A 2 , …A n ) , a
proximity measure P(X,Y) between S and [A] can be defined using the P(S,A i ) values of the
pairwise comparisons; for example, one can take the minimal, the maximal or the average
of the P(S,A i ) values as the proximity measure between S and the group. Another possibility
is to calculate from the descriptions A i a “consensus value” <A> , sometimes called the
centroid of [A] . If the descriptions are simple numeric values or vectors, <A> can be
defined as their average. If A i -s are vectors, <A> can be their vectorial average, etc. Then,
the proximity measure between S and A can be calculated as P(S,<A>) .
Proximity measures between two groups of objects [A] and [B] can be defined in a similar
way: we can take the minimum, maximum or average of the P(A i ,B j ) proximity measures,
or determine the proximity of the two centroids, P(<A>,<B>) .
If a single object is compared to group [A] in terms of a feature f that is supposed to be
normally distributed in [A] , with mean m and standard deviation sd , then, instead of the
f
m
simple difference
f we can use a scaled value
m
for calculating a distance
sd
between an object and the group. Similarly, one can calculate a distance between two
groups (denoted by upper indices 1 and 2, respectively) using the values
1
2
m
m
. The resulting distance values will thus incorporate a natural scaling
1
2
2
2
(
sd
)
(
sd
)
based on the different variance of the groups. This scaling can be generalized to cases in
which the objects to be compared are represented as vectors of features f 1 , f 2 …f n
characterized by a covariance matrix C . In this case, the so-called Mahalanobis distance is
defined as:
[13]
1
2
1
2
MD
(
m
m
)'
C
^
(
m
m
)
where m 1 and m 2 are average vectors for group 1 and group 2, respectively,
1
2
(
m
m
)'
is
1
2
the transpose of
m and C^ is the inverse of the variance-covariance matrix C . MD
can be viewed as an Euclidean distance scaled by the covariance matrix, the latter being
assumed to be identical for both groups.
(
m
)
3. Matching (alignment)
For two structures to be similar, one has to find a matching in terms of entities and
relationships. Such a matching is shown in Figure 3 . A matching resembles an analogy. In
Search WWH ::




Custom Search