Automatic Evaluation of Ontologies - Natural Language Processing and Text Mining

Information Technology Reference

In-Depth Information

From a purely algorithmic point of view, it would also be interesting to explore

if the ontology similarity measure as currently defined in Section 11.4.3 can be

accurately computed in sub-quadratic time (in terms of the number of instances).

The experimental evaluation in Section 11.5 could be extended with various other

operations. For example, we could split existing leaf concepts into subconcepts, either

randomly or using some clustering technique. This is the converse of the operation of

removing the leaf concepts described in Section 11.5.1. Another possibly interesting

operation would be to merge two or more sibling concepts. As the experiments

with switching a concept and its parent showed (Sec. 11.5.2), a rearrangement of

concepts in the upper levels of the tree (in our case we were switching a third-level

concept and its parent, which is a second-level concept) might have only a very

small effect on the similarity measure. Depending on the intended application, this

may be undesirable from a human point of view because changes in the upper levels

correspond to significantly different decisions regarding the conceptualization of the

main concepts (especially the more abstract ones) of the domain of interest. These

are important decisions that occur in the early stages of ontology construction;

therefore, it might be helpful if our similarity measures could be extended to be

more sensitive to such differences in the organization of concepts in the upper levels

of the ontology.

11.6.1 Evaluation without Assuming the Same Set of Instances

The proposed approach presented in Section 11.4 assumes that we are comparing

two ontologies based on the same set of instances (but with different sets of concepts,

different assignment of instances to concepts and different arrangement of concepts

into a hierarchy). One way to extend this approach would be to allow for comparison

of ontologies based on different sets of instances. In this case it is no longer possible to

take a pair of instances and observe where they are placed in one ontology and where

in the other, because each ontology has its own separate set of instances. Assuming

that each instance is represented by a textual document, some kind of matching

would need to be introduced. Given two instances o i and o j from the ontology

U , one might find a few nearest neighbours of o i and o j in the ontology V , and

observe δ V on the pairs of these nearest neighbours. However, this would introduce

an additional level of time complexity.

Comparison of two ontologies could also be based on the principle of edit dis-

tance. In this case one is looking for a sequence of edit operations that can transform

one ontology into the other, while minimizing the total cost (e.g., the number of

edit operations). However, if the two ontologies have different sets of concepts (and

possibly even different sets of instances), it might be di cult to e ciently find a

minimum-cost sequence of edit operations. Some e cient algorithms for comparing

ordered trees on the edit distance principle are known (see e.g., [4]), but here we

would be dealing with unordered trees.

Another direction that might be promising to explore would be ontology similar-

ity measures based on information-theoretic principles. For example, the variation-

of-information metric for comparing two flat partitions of a set of instances [17] has

been shown to have a number of desirable and theoretically appealing characteris-

tics [18]. Essentially this metric treats cluster membership as a random variable; two

different partitions of a set of instances are treated as two random variables and the

mutual information between them is used as a measure of the similarity of the two

Natural Language Processing and Text Mining

Search WWH ::

Custom Search

Home