Information Technology Reference
In-Depth Information
Table 11.1. An overview of approaches to ontology evaluation on different levels.
Approach to ontology evaluation
Level
Gold
Application Data-driven Assessment
standard
based
by humans
Lexical, vocabulary, concept, data
×
×
×
×
Hierarchy, taxonomy
×
×
×
×
Other semantic relations
×
×
×
×
Context (repository/application)
×
×
× 1
Syntactic
×
Structure, architecture, design
×
1 “Gold standard” in the sense of comparing the syntax in the ontology definition with the syntax
specification of the formal language in which the ontology is written (e.g., RDF, OWL, etc.).
11.2.2 Evaluation on the Lexical/Vocabulary and Concept/Data
Level
An example of an approach that can be used for the evaluation of a lexical/vocabulary
level of an ontology is the one proposed by Maedche and Staab [16]. Similarity be-
tween two strings is measured based on the Levenshtein edit distance [14], normal-
ized to produce scores in the range [0, 1]. Sometimes background knowledge about
the domain can be used to introduce an improved domain-specific definition of the
edit distance; for example, when comparing names of persons, one might take into
account the fact that first names are often abbreviated [6]. A string matching mea-
sure between two sets of strings is then defined by taking each string of the first set,
finding its similarity to the most similar string in the second set, and averaging this
over all strings of the first set. One may take the set of all strings used as concept
identifiers in the ontology being evaluated, and compare it to a “gold standard” set
of strings that are considered a good representation of the concepts of the problem
domain under consideration. The gold standard could be in fact another ontology
(as in Maedche and Staab's work), or it could be taken statistically from a corpus
of documents (see Section 11.2.4), or prepared by domain experts.
The lexical content of an ontology can also be evaluated using the concepts of
precision and recall, as known in information retrieval. In this context, precision
would be the percentage of the ontology lexical entries (strings used as concept
identifiers) that also appear in the gold standard, relative to the total number of
ontology words. Recall is the percentage of the gold standard lexical entries that also
appear as concept identifiers in the ontology, relative to the total number of gold
standard lexical entries. A downside of the precision and recall measures defined
in this way is that they do not allow for minor differences in spelling (e.g., use of
hyphens in multi-word phrases, etc.). Another way to achieve more tolerant matching
criteria [2] is to augment each lexical entry with its hypernyms from WordNet or
some similar resource; then, instead of testing for equality of two lexical entries, one
can test for overlap between their corresponding sets of words (each set containing
an entry with its hypernyms).
The same approaches could also be used to evaluate the lexical content of an
ontology on other levels, e.g., the strings used to identify relations, instances, etc.
 
Search WWH ::




Custom Search