Automatic Evaluation of Ontologies - Natural Language Processing and Text Mining

Information Technology Reference

In-Depth Information

Table 11.1. An overview of approaches to ontology evaluation on different levels.

Approach to ontology evaluation

Level

Gold

Application Data-driven Assessment

standard

based

by humans

Lexical, vocabulary, concept, data

×

Hierarchy, taxonomy

×

Other semantic relations

×

Context (repository/application)

×

× 1

Syntactic

×

Structure, architecture, design

×

1 “Gold standard” in the sense of comparing the syntax in the ontology definition with the syntax

specification of the formal language in which the ontology is written (e.g., RDF, OWL, etc.).

11.2.2 Evaluation on the Lexical/Vocabulary and Concept/Data

Level

An example of an approach that can be used for the evaluation of a lexical/vocabulary

level of an ontology is the one proposed by Maedche and Staab [16]. Similarity be-

tween two strings is measured based on the Levenshtein edit distance [14], normal-

ized to produce scores in the range [0, 1]. Sometimes background knowledge about

the domain can be used to introduce an improved domain-specific definition of the

edit distance; for example, when comparing names of persons, one might take into

account the fact that first names are often abbreviated [6]. A string matching mea-

sure between two sets of strings is then defined by taking each string of the first set,

finding its similarity to the most similar string in the second set, and averaging this

over all strings of the first set. One may take the set of all strings used as concept

identifiers in the ontology being evaluated, and compare it to a “gold standard” set

of strings that are considered a good representation of the concepts of the problem

domain under consideration. The gold standard could be in fact another ontology

(as in Maedche and Staab's work), or it could be taken statistically from a corpus

of documents (see Section 11.2.4), or prepared by domain experts.

The lexical content of an ontology can also be evaluated using the concepts of

precision and recall, as known in information retrieval. In this context, precision

would be the percentage of the ontology lexical entries (strings used as concept

identifiers) that also appear in the gold standard, relative to the total number of

ontology words. Recall is the percentage of the gold standard lexical entries that also

appear as concept identifiers in the ontology, relative to the total number of gold

standard lexical entries. A downside of the precision and recall measures defined

in this way is that they do not allow for minor differences in spelling (e.g., use of

hyphens in multi-word phrases, etc.). Another way to achieve more tolerant matching

criteria [2] is to augment each lexical entry with its hypernyms from WordNet or

some similar resource; then, instead of testing for equality of two lexical entries, one

can test for overlap between their corresponding sets of words (each set containing

an entry with its hypernyms).

The same approaches could also be used to evaluate the lexical content of an

ontology on other levels, e.g., the strings used to identify relations, instances, etc.

Natural Language Processing and Text Mining

Search WWH ::

Custom Search

Home