Information Technology Reference
In-Depth Information
Fig. 11.1. Evaluation of ontologies that lack lower levels, based on the OntoRand
index. The overlap-based similarity measure uses eq. (11.2) to define δ U , while the
tree-distance based similarity measure uses eq. (11.3). The dotted line shows an
analytical approximation of the OntoRand values based on the overlap similarity
measure.
would estimate the similarity of this ontology to the gold standard (having an av-
erage node depth of approx. 7) as 0.94. On the other hand, if we stopped after at
most three levels, the OntoRand index would be 0.74.
It may be somewhat surprising that the similarity of an ontology to the original
one is still as high as 0.74 even if only the top three levels of the ontology have
been kept. To understand this, consider a pair of random concepts; in the original
hierarchy, they are typically unrelated and are located around the 7th level, so
the ancestor sets of eq. (11.2) have an intersection of 1 and a union of around 13,
resulting in the overlap measure δ ≈ 1 / 13. In the pruned hierarchy, where only k
uppermost levels have been retained, and documents from lower nodes reassigned
to the ancestor nodes at level k − 1, such a random pair of documents would yield δ
around 1 / (2 k − 1). Thus such pairs of documents would push the OntoRand index
value towards 1 −| 1 / 13 1 / (2 k − 1) | . As the “analytical approximation” in the
chart shows, this is not an altogether bad predictor of the shape of the curve for the
overlap-based measure.
The tree-distance similarity measure is slightly more problematic in this scenario.
In the original tree, a typical random pair of instances falls into unrelated concepts
that have no common ancestors except the root, i.e., h = 0 and thus δ =0(or δ
close to 0 even if h> 0). If a few deepest levels of the tree are removed and instances
reassigned to the suitable ancestor concepts, any pair of instances that used to have
h = 0 will still have h =0,thusits δ according to eq. (11.3) remains unchanged and
this pair does not help decrease the similarity measure between the new hierarchy
and the original one. This is why the similarity as measured by OntoRand remains
relatively high all the time. Only concept pairs with h> 0 contribute towards the
dissimilarity, because their distance ( l in eq. (11.3)) decreases if the lower levels are
pruned away and the instances moved to higher-level concepts. Because l is used in
the term e −αl , decreasing l causes the value of δ to increase for that pair of instances;
the more levels we prune away, the larger δ will be compared to its original value,
and the OntoRand similarity decreases accordingly. A quirk occurs at the very end,
Search WWH ::




Custom Search