Information Technology Reference
In-Depth Information
related clusters in at least one of the two ontologies, and computing the exact sum
for these pairs, while disregarding the remaining pairs (or processing them using the
subsampling technique from the previous paragraph). For example, suppose that δ U
is defined by eq. (11.4) as δ ( l, h )=( h +1) / ( h + l + 1). Thus, we need to find pairs
of concepts for which ( h +1) / ( h + l + 1) is greater than some threshold ε . (Then
we will know that detailed processing is advisable for pairs of instances which fall
into one of these pairs of concepts.) The condition ( h +1) / ( h + l +1) can be
rewritten as l< ( h + 1)(1 /ε− 1). Thus, suitable pairs of concepts could be identified
by the following algorithm:
Initialize P := {} .
For each concept c :
Let h be the depth of c , and let L = ( h + 1)(1 /ε − 1) .
Denote the children of c (its immediate subconcepts) by c 1 ,...,c r .
For each l from 1 to L , for each i from 1 to r ,let S l,i be the set of
those subconcepts of c that are also subconcepts of c i
and are l levels below c in the tree.
For each l from 1 to L , for each i from 1 to r ,
add to P all the pairs from S l,i × ( l ≤L− 1 i = i S l ,i ).
In each iteration of the outermost loop, the algorithm processes a concept c and
discovers all pairs of concepts c ,c such that c is the deepest common ancestor of
c and c and δ U ( c ,c ) . For more e cient maintenance of the S l,i sets, it might
be advisable to process the concepts c in a bottom-up manner, since the sets for a
parent concept can be obtained by merging appropriate sets of its children.
For the time being, we have tested random sampling of pairs as outlined at the
beginning of this subsection. Separate treatment of pairs with ( h +1) / ( h + l +1)
will be the topic of future work.
11.5 Evaluation of the Proposed Approach
The idea of evaluating the proposed approach to automatic ontology evaluation is in
showing its output on several concrete situations enabling the reader to get an idea
of the approach results given a well-defined mismatch in the ontologies (the learned
ontology and the “gold-standard” ontology). Namely, instead of learning an ontology
that we then evaluate, we use the “gold-standard” ontology, introduce some errors
in it and use it to simulate the learned ontology. We have defined several simple and
intuitive operations for introducing errors in the “gold-standard” ontology. The aim
is to illustrate a kind of mismatch that can be found between the learned ontology
and the “gold-standard” ontology and its influence on the evaluation score of the
proposed OntoRand index. The following operations are presented below in our
evaluation of the proposed approach:
Removing lower levels of the tree — deleting all concepts below a certain depth
in the tree (see Section 11.5.1).
Swapping a concept and its parent (see Section 11.5.2).
Reassigning instances to concepts based on their associated natural language
text (see Section 11.5.3).
Search WWH ::




Custom Search