Information Technology Reference
In-Depth Information
be a large disagreement. We use the formula for OntoRandIdx ( U, V ) given above,
where the functions δ U and δ V take this intuition into account. That is, rather than
returning merely 1 or 0 depending on whether the given two clusters are the same
or not, the functions δ U and δ V should return a real number from the range [0 , 1],
expressing a measure of how closely related the two clusters are.
By plugging in various definitions of the functions δ U and δ V , we can obtain a
family of similarity measures for ontologies, suitable for comparing an ontology with
the gold standard in the context of the task that has been discussed in Section 11.4.1.
We propose two concrete families of δ U and δ V . Since the definitions of δ U and δ V
will always be analogous to each other and differ only in the fact that each applies
to a different ontology, we refer only to the δ U function in the following discussion.
Similarity Based on Common Ancestors
One possibility is inspired by the approach that is sometimes used to evaluate the
performance of classification models for classification in hierarchies (see, e.g., [19]),
and that could incidentally also be useful in the context of e.g., evaluating an auto-
matic ontology population system. Given a concept U i in the ontology U ,let A ( U i )
be the set of all ancestors of this concept, i.e., all concepts on the path from the
root to U i (including U i itself). If two concepts U i and U j have a common parent,
the sets A ( U i ) and A ( U j ) will have a large intersection; on the other hand, if they
have no common parent except the root, the intersection of A ( U i ) and A ( U j ) will
contain only the root concept. Thus the size of the intersection can be taken as a
measure of how closely related the two concepts are.
δ U ( U i ,U j )= |A ( U i ) ∩ A ( U j ) |/|A ( U i ) ∪ A ( U j ) |.
(11.2)
This measure (also known as the Jaccard coe cient) has the additional nice char-
acteristic that it can be extended to cases where U is not a tree but an arbitrary
directed acyclic graph. If the arrows in this graph point from parents to children,
the set A ( U i ) is simply the set of all nodes from which U is reachable.
Similarity Based on Distance in the Tree
An alternative way to define a suitable function δ U would be to work directly with
the distances between U i and U j in the tree U . In this case, let l be the distance
between U i and U j in the tree (length of the path from U i to the common ancestor
of U i and U j , and thence down to U j ), and h be the depth of the deepest common
ancestor of U i and U j .If l is large, this is a sign that U i and U j are not very closely
related; similarly, if h is small, this is a sign that U i and U j don't have any common
ancestors except very general concepts close to the root, and therefore U i and U j
aren't very closely related. There are various ways of taking these intuitions into
account in a formula for δ U as a function of l and h . For example, Rada et al. [23]
have proposed a distance measure of the form:
δ ( l, h )= e −αl tanh( βh )
(11.3)
Here, α and β are nonnegative constants, and tanh is the hyperbolic tangent
tanh( x )=( e x
− e −x ) / ( e x + e −x )=1 2 / (1 + e 2 x ) .
Search WWH ::




Custom Search