Information Technology Reference
In-Depth Information
nearest neighbor approach using 10-20 word descriptions to accurately match the
classification of the human editors working for dmoz). In fact, it turns out that
93% of documents are moved to a different concept during the first top-down re-
assignment step (or 66% during the first thorough reassignment step). However,
the similarity measure between the new ontology and the original one is neverthe-
less fairly high (around 0.74). The reasons for this are: firstly, only the assignment
of documents to concepts has been changed, but not the hierarchical relationship
between the concepts; secondly, if documents are moved to different concepts in a
consistent way, δ U may change fairly little for most pairs of documents, resulting in
a high OntoRand index value; thirdly, even though 93% of documents were moved
to a different concept, the new concept was often fairly close to the original one.
This is shown on the right chart, where the value of δ U was computed between the
concept containing a document in the original ontology and the one containing this
document after a certain number of reassignment steps; this was then averaged over
all documents. As this chart shows, even though only 7% of documents remained in
the same concept during the first step of top-down reassignment, the average (over
all documents) δ U between the original and the new concept is not 0.07 but much
higher — approx. 0.31.
11.6 Discussion and Future Work
The main features of our proposed approach are that it focuses on fully automated
evaluation of ontologies, based on comparison with a gold standard ontology; it does
not make any assumptions regarding the description or representation of instances
and concepts, but assumes that both ontologies have the same set of instances. We
proposed a new ontology similarity measure, OntoRand index, designed by analogy
with the Rand index that is commonly used to compare partitions of a set. We
propose several versions of the OntoRand index based on different underlying mea-
sures of distance between concepts in the ontology. We evaluated the approach on
a large ontology based on the dmoz.org web directory. The experiments were based
on several operations that modify the gold standard ontology in order to simulate
possible discrepancies that may occur if a different ontology is constructed over
the same problem domain (and same set of instances). The experiments show that
the measure based on overlap of ancestor sets (Section 11.4.3) is more convenient
than the measure based on tree distance (Sec. 11.4.3), because the latter requires
the user to define the values of two parameters and it is not obvious how to do
this in a principled way. Additionally, the tree-distance based measure is often less
successful at spreading similarity values over a greater part of the [0 , 1] interval; to
address this issue, we propose a modified similarity measure (eq. 11.5), which we will
evaluate experimentally in future work. Another issue, which is shared by both sim-
ilarity measures proposed here, is that the resulting OntoRand index is sometimes
insu ciently sensitive to differences that occur in the upper levels of the ontology
(Sec. 11.5.2). Section 11.5.3 indicates another possible drawback of this approach,
namely that keeping the structure of the concept hierarchy and modifying only the
assignment of instances to concepts may not affect the similarity measure as much
as a human observer might expect.
Search WWH ::




Custom Search