Information Technology Reference
In-Depth Information
From a purely algorithmic point of view, it would also be interesting to explore
if the ontology similarity measure as currently defined in Section 11.4.3 can be
accurately computed in sub-quadratic time (in terms of the number of instances).
The experimental evaluation in Section 11.5 could be extended with various other
operations. For example, we could split existing leaf concepts into subconcepts, either
randomly or using some clustering technique. This is the converse of the operation of
removing the leaf concepts described in Section 11.5.1. Another possibly interesting
operation would be to merge two or more sibling concepts. As the experiments
with switching a concept and its parent showed (Sec. 11.5.2), a rearrangement of
concepts in the upper levels of the tree (in our case we were switching a third-level
concept and its parent, which is a second-level concept) might have only a very
small effect on the similarity measure. Depending on the intended application, this
may be undesirable from a human point of view because changes in the upper levels
correspond to significantly different decisions regarding the conceptualization of the
main concepts (especially the more abstract ones) of the domain of interest. These
are important decisions that occur in the early stages of ontology construction;
therefore, it might be helpful if our similarity measures could be extended to be
more sensitive to such differences in the organization of concepts in the upper levels
of the ontology.
11.6.1 Evaluation without Assuming the Same Set of Instances
The proposed approach presented in Section 11.4 assumes that we are comparing
two ontologies based on the same set of instances (but with different sets of concepts,
different assignment of instances to concepts and different arrangement of concepts
into a hierarchy). One way to extend this approach would be to allow for comparison
of ontologies based on different sets of instances. In this case it is no longer possible to
take a pair of instances and observe where they are placed in one ontology and where
in the other, because each ontology has its own separate set of instances. Assuming
that each instance is represented by a textual document, some kind of matching
would need to be introduced. Given two instances o i and o j from the ontology
U , one might find a few nearest neighbours of o i and o j in the ontology V , and
observe δ V on the pairs of these nearest neighbours. However, this would introduce
an additional level of time complexity.
Comparison of two ontologies could also be based on the principle of edit dis-
tance. In this case one is looking for a sequence of edit operations that can transform
one ontology into the other, while minimizing the total cost (e.g., the number of
edit operations). However, if the two ontologies have different sets of concepts (and
possibly even different sets of instances), it might be di cult to e ciently find a
minimum-cost sequence of edit operations. Some e cient algorithms for comparing
ordered trees on the edit distance principle are known (see e.g., [4]), but here we
would be dealing with unordered trees.
Another direction that might be promising to explore would be ontology similar-
ity measures based on information-theoretic principles. For example, the variation-
of-information metric for comparing two flat partitions of a set of instances [17] has
been shown to have a number of desirable and theoretically appealing characteris-
tics [18]. Essentially this metric treats cluster membership as a random variable; two
different partitions of a set of instances are treated as two random variables and the
mutual information between them is used as a measure of the similarity of the two
Search WWH ::




Custom Search