Information Technology Reference
In-Depth Information
compare a learned ontology to the gold-standard ontology (since both will be, in the
context of this ontology learning task, two hierarchical partitions of the same set of
instances).
One popular measure of agreement between two flat partitions is the Rand index
[24]. Assume that there is a set of instances O = {o 1 ,...,o n } , 1 with two partitions
of O into a family of disjoint subsets, U = {U 1 ,...,U m } and V = {V 1 ,...,V k } ,
where i =1 ..m U i = O , j =1 ..k V j = O , U i ∩ U i = for each 1 ≤ i<i ≤ m , and
V j ∩ V j = for each 1 ≤ j<j ≤ k . Then one way to compare the partitions U
and V is to count the agreements and disagreements in the placement of instances
into clusters. If two items o i ,o j ∈ O belong to the same cluster of U but to two
separate clusters of V , or vice versa, this is considered a disagreement. On the other
hand, if they belong to the same cluster in both partitions, or to separate clusters
in both partitions, this is considered an agreement between partitions. The Rand
index between U and V is the number of agreements relative to the total number of
pairs of instances (i.e., to n ( n − 1) / 2).
11.4.3 A Similarity Measure for Ontologies
We can elegantly formulate a similarity measure over ontologies by rephrasing the
Rand index as follows. Let us denote by U ( o ) the cluster of U that contains the
instance o ∈ O , and similarly by V ( o ) the cluster of V that contains the instance
o ∈ O .Let δ X ( X i ,X j ) be some distance measure between clusters X i and X j of a
partition X . Then we define the OntoRand index by the following formula:
1 i<j n | δ U ( U ( o i ) ,U ( o j )) δ V ( V ( o i ) ,V ( o j )) |
n ( n − 1) / 2
OntoRandIdx ( U, V )=1
. (11.1)
If we define δ U ( U i ,U j )=1if U i = U j , and δ U ( Ui,Uj ) = 0 otherwise and δ V as
well in an analogous manner, we can see that the Rand index is a special case of
our OntoRand index. That is, the term bracketed by | ...| in eq. (11.1) equals 1 if
there is a disagreement between U and V concerning the placement of the pair of
instances o i and o j .Thesumoverall i and j therefore counts the number of pairs
where a disagreement occurs.
When we apply the OntoRand index for the purpose of comparing ontologies,
we must take the hierarchical arrangement of concepts into account. In the original
Rand index, what matters for a particular pair of instances is simply if they be-
long to the same cluster or not. However, when concepts or clusters are organized
hierarchically, not any two different clusters are equally different. For example, two
concepts with a common parent in the tree are likely to be quite similar even though
they are not exactly the same; on the other hand, two concepts that do not have any
common ancestor except the root of the tree are probably highly unrelated. Thus, if
one ontology places a pair of instances in the same concept while the other ontology
places this pair of instances in two different concepts with a common parent, this
is a disagreement, but not a very strong one; on the other hand, if the second on-
tology places the two instances into two completely unrelated concepts, this would
1 In this section, O stands only for the set of instances, not for an entire ontology
as in Sec. 11.3. We use O instead of I for the set of instances to prevent confusion
with the use of i as an index in subscripts.
 
Search WWH ::




Custom Search