Automatic Evaluation of Ontologies - Natural Language Processing and Text Mining - page 208

Information Technology Reference

In-Depth Information

Fig. 11.1. Evaluation of ontologies that lack lower levels, based on the OntoRand

index. The overlap-based similarity measure uses eq. (11.2) to define δ U , while the

tree-distance based similarity measure uses eq. (11.3). The dotted line shows an

analytical approximation of the OntoRand values based on the overlap similarity

measure.

would estimate the similarity of this ontology to the gold standard (having an av-

erage node depth of approx. 7) as 0.94. On the other hand, if we stopped after at

most three levels, the OntoRand index would be 0.74.

It may be somewhat surprising that the similarity of an ontology to the original

one is still as high as 0.74 even if only the top three levels of the ontology have

been kept. To understand this, consider a pair of random concepts; in the original

hierarchy, they are typically unrelated and are located around the 7th level, so

the ancestor sets of eq. (11.2) have an intersection of 1 and a union of around 13,

resulting in the overlap measure δ ≈ 1 / 13. In the pruned hierarchy, where only k

uppermost levels have been retained, and documents from lower nodes reassigned

to the ancestor nodes at level k − 1, such a random pair of documents would yield δ

around 1 / (2 k − 1). Thus such pairs of documents would push the OntoRand index

value towards 1 −| 1 / 13 − 1 / (2 k − 1) | . As the “analytical approximation” in the

chart shows, this is not an altogether bad predictor of the shape of the curve for the

overlap-based measure.

The tree-distance similarity measure is slightly more problematic in this scenario.

In the original tree, a typical random pair of instances falls into unrelated concepts

that have no common ancestors except the root, i.e., h = 0 and thus δ =0(or δ

close to 0 even if h> 0). If a few deepest levels of the tree are removed and instances

reassigned to the suitable ancestor concepts, any pair of instances that used to have

h = 0 will still have h =0,thusits δ according to eq. (11.3) remains unchanged and

this pair does not help decrease the similarity measure between the new hierarchy

and the original one. This is why the similarity as measured by OntoRand remains

relatively high all the time. Only concept pairs with h> 0 contribute towards the

dissimilarity, because their distance ( l in eq. (11.3)) decreases if the lower levels are

pruned away and the instances moved to higher-level concepts. Because l is used in

the term e −αl , decreasing l causes the value of δ to increase for that pair of instances;

the more levels we prune away, the larger δ will be compared to its original value,

and the OntoRand similarity decreases accordingly. A quirk occurs at the very end,

Next Page

Natural Language Processing and Text Mining

Search WWH ::

Custom Search

Home