Automatic Evaluation of Ontologies - Natural Language Processing and Text Mining

Information Technology Reference

In-Depth Information

compare a learned ontology to the gold-standard ontology (since both will be, in the

context of this ontology learning task, two hierarchical partitions of the same set of

instances).

One popular measure of agreement between two flat partitions is the Rand index

[24]. Assume that there is a set of instances O = {o 1 ,...,o n } , 1 with two partitions

of O into a family of disjoint subsets, U = {U 1 ,...,U m } and V = {V 1 ,...,V k } ,

where ∪ i =1 ..m U i = O , ∪ j =1 ..k V j = O , U i ∩ U i = ∅ for each 1 ≤ i<i ≤ m , and

V j ∩ V j = ∅ for each 1 ≤ j<j ≤ k . Then one way to compare the partitions U

and V is to count the agreements and disagreements in the placement of instances

into clusters. If two items o i ,o j ∈ O belong to the same cluster of U but to two

separate clusters of V , or vice versa, this is considered a disagreement. On the other

hand, if they belong to the same cluster in both partitions, or to separate clusters

in both partitions, this is considered an agreement between partitions. The Rand

index between U and V is the number of agreements relative to the total number of

pairs of instances (i.e., to n ( n − 1) / 2).

11.4.3 A Similarity Measure for Ontologies

We can elegantly formulate a similarity measure over ontologies by rephrasing the

Rand index as follows. Let us denote by U ( o ) the cluster of U that contains the

instance o ∈ O , and similarly by V ( o ) the cluster of V that contains the instance

o ∈ O .Let δ X ( X i ,X j ) be some distance measure between clusters X i and X j of a

partition X . Then we define the OntoRand index by the following formula:

1 ≤ i<j ≤ n | δ U ( U ( o i ) ,U ( o j )) − δ V ( V ( o i ) ,V ( o j )) |

n ( n − 1) / 2

OntoRandIdx ( U, V )=1 −

. (11.1)

If we define δ U ( U i ,U j )=1if U i = U j , and δ U ( Ui,Uj ) = 0 otherwise and δ V as

well in an analogous manner, we can see that the Rand index is a special case of

our OntoRand index. That is, the term bracketed by | ...| in eq. (11.1) equals 1 if

there is a disagreement between U and V concerning the placement of the pair of

instances o i and o j .Thesumoverall i and j therefore counts the number of pairs

where a disagreement occurs.

When we apply the OntoRand index for the purpose of comparing ontologies,

we must take the hierarchical arrangement of concepts into account. In the original

Rand index, what matters for a particular pair of instances is simply if they be-

long to the same cluster or not. However, when concepts or clusters are organized

hierarchically, not any two different clusters are equally different. For example, two

concepts with a common parent in the tree are likely to be quite similar even though

they are not exactly the same; on the other hand, two concepts that do not have any

common ancestor except the root of the tree are probably highly unrelated. Thus, if

one ontology places a pair of instances in the same concept while the other ontology

places this pair of instances in two different concepts with a common parent, this

is a disagreement, but not a very strong one; on the other hand, if the second on-

tology places the two instances into two completely unrelated concepts, this would

1 In this section, O stands only for the set of instances, not for an entire ontology

as in Sec. 11.3. We use O instead of I for the set of instances to prevent confusion

with the use of i as an index in subscripts.

Natural Language Processing and Text Mining

Search WWH ::

Custom Search

Home