Automatic Evaluation of Ontologies - Natural Language Processing and Text Mining

Information Technology Reference

In-Depth Information

related clusters in at least one of the two ontologies, and computing the exact sum

for these pairs, while disregarding the remaining pairs (or processing them using the

subsampling technique from the previous paragraph). For example, suppose that δ U

is defined by eq. (11.4) as δ ( l, h )=( h +1) / ( h + l + 1). Thus, we need to find pairs

of concepts for which ( h +1) / ( h + l + 1) is greater than some threshold ε . (Then

we will know that detailed processing is advisable for pairs of instances which fall

into one of these pairs of concepts.) The condition ( h +1) / ( h + l +1) >ε can be

rewritten as l< ( h + 1)(1 /ε− 1). Thus, suitable pairs of concepts could be identified

by the following algorithm:

Initialize P := {} .

For each concept c :

Let h be the depth of c , and let L = ( h + 1)(1 /ε − 1) .

Denote the children of c (its immediate subconcepts) by c 1 ,...,c r .

For each l from 1 to L , for each i from 1 to r ,let S l,i be the set of

those subconcepts of c that are also subconcepts of c i

and are l levels below c in the tree.

For each l from 1 to L , for each i from 1 to r ,

add to P all the pairs from S l,i × ( ∪ l ≤L− 1 ∪ i = i S l ,i ).

In each iteration of the outermost loop, the algorithm processes a concept c and

discovers all pairs of concepts c ,c such that c is the deepest common ancestor of

c and c and δ U ( c ,c ) >ε . For more e cient maintenance of the S l,i sets, it might

be advisable to process the concepts c in a bottom-up manner, since the sets for a

parent concept can be obtained by merging appropriate sets of its children.

For the time being, we have tested random sampling of pairs as outlined at the

beginning of this subsection. Separate treatment of pairs with ( h +1) / ( h + l +1) >ε

will be the topic of future work.

11.5 Evaluation of the Proposed Approach

The idea of evaluating the proposed approach to automatic ontology evaluation is in

showing its output on several concrete situations enabling the reader to get an idea

of the approach results given a well-defined mismatch in the ontologies (the learned

ontology and the “gold-standard” ontology). Namely, instead of learning an ontology

that we then evaluate, we use the “gold-standard” ontology, introduce some errors

in it and use it to simulate the learned ontology. We have defined several simple and

intuitive operations for introducing errors in the “gold-standard” ontology. The aim

is to illustrate a kind of mismatch that can be found between the learned ontology

and the “gold-standard” ontology and its influence on the evaluation score of the

proposed OntoRand index. The following operations are presented below in our

evaluation of the proposed approach:

•

Removing lower levels of the tree — deleting all concepts below a certain depth

in the tree (see Section 11.5.1).

•

Swapping a concept and its parent (see Section 11.5.2).

•

Reassigning instances to concepts based on their associated natural language

text (see Section 11.5.3).

Natural Language Processing and Text Mining

Search WWH ::

Custom Search

Home