Information Technology Reference
In-Depth Information
partitions. This similarity measure could be extended to hierarchical partitions. It
would need to roughly answer a question such as: How many bits of information do
we need to convey in order to describe, for each instance, where it belongs in the
second hierarchy, if we already know the position of all instances in the first hierar-
chy? A suitable coding scheme would need to be introduced; e.g., for each concept c
of the first hierarchy, find the most similar concept c in the second hierarchy; then,
for each instance o from c , to describe its position in the second hierarchy, list a
sequence of steps (up and down the is-a connections in the hierarchy) that leads
from c to the concept that actually contains the instance o .
11.6.2 Evaluation without a Gold Standard
It would also be interesting to try evaluating an ontology “by itself” rather than
comparing it to a gold standard. This type of evaluation would be useful in many
contexts where a gold standard ontology is not available. One possibility is to have
a partial gold standard, such as a list of important concepts but not a hierarchy;
evaluation could then be based on precision and recall (i.e., observing how many
of the concepts from the gold-standard list also appear in the constructed ontology,
and vice versa). Another scenario is if a gold standard is not available for our domain
of interest but for some other domain, we can use that domain and its gold stan-
dard to evaluate/compare different ontology learning algorithms and/or tune their
parameters, then use the resulting settings on the actual domain of our interest in
the hope that the result will be a reasonable ontology, even though we do not have
a gold standard to compare it to.
However, approaches that completely avoid the need for a gold standard could
also be considered. In the case of “flat” partitions in traditional clustering, measures
such as cluster compactness or inter-cluster distance are often used to evaluate a
flat partition: instances from the same cluster should be close to each other, while
instances from different clusters should be as far apart as possible. Measures of
this sort could also be extended to hierarchical partitions. One could also envision
using machine learning methods to evaluate a partition: the partition can be seen
as dividing the set of instances into several disjoint classes, and we can try learning
a classification model for each class. If the partition of instances into classes was
reasonable, one would expect the resulting classifiers to perform better than if the
partition was essentially random or unrelated to the attributes of the instances.
11.7 Acknowledgments
This work was supported by the Slovenian Research Agency and the IST Programme
of the European Community under SEKT Semantically Enabled Knowledge Tech-
nologies (IST-1-506826-IP) and PASCAL Network of Excellence (IST-2002-506778).
This publication only reflects the authors' views.
Search WWH ::




Custom Search