Information Technology Reference
In-Depth Information
Thus, if h is small, th ( βh ) is close to 0, whereas for a large h it becomes close to
1. It is reasonable to treat the case when the two concepts are the same, i.e., when
U i = U j and thus l = 0, as a special case, and define δ (0 ,h ) = 1 in that case, to
prevent δ U ( U i ,U i ) from being dependent on the depth of the concept U i .
Incidentally, if we set α to 0 (or close to 0) and β to some large value, δ ( l, h ) will
be approx. 0 for h = 0 and approx. 1 for h> 0. Thus, in the sum used to define the
OntoRand index (11.1), each pair of instances contributes the value of 1 if they have
some common ancestor besides the root in one ontology but not in other, otherwise
it contributes the value of 0. Thus, the OntoRand index becomes equivalent to
the ordinary Rand index computed over the partitions of instances implied by the
second-level concepts of the two ontologies (i.e., the immediate subconcepts of the
root concept). This can be taken as a warning that α should not be too small and β
not too large, otherwise the OntoRand index will ignore the structure of the lower
levels of the ontologies.
The overlap-based version of d U from eq. (11.2) can also be defined in terms of
h and l . If the root is taken to be at depth 0, then the intersection of A ( U i ) and
A ( U j ) contains h + 1 concepts, and the union of A ( U i ) and A ( U j ) contains h + l +1
concepts. Thus, we see that eq. (11.2) is equivalent to defining
δ ( l, h )=( h +1) / ( h + l +1) .
(11.4)
By comparing the equations (11.3) and (11.4), we see a notable difference between
the two definitions of δ : when h = 0, i.e., when the two instances have no common
ancestor except the root, eq. (11.3) returns δ = 0 while eq. (11.4) returns δ =
1 / ( l +1) > 0. When comparing two ontologies, it may often happen that many
pairs of instances have no common ancestor (except the root) in either of the two
ontologies, i.e., h U = h V = 0, but the distance between their concepts is likely to be
different: l U = l V . In these cases, using eq. (11.3) will result in δ U = δ V = 0, while
eq. (11.4) will result in δ U = δ V . When the resulting values U − δ V | are used in
eq. (11.1), we see that in the case of definition (11.3), many terms in the sum will be
0 and the OntoRand index will be close to 1. For example, in our experiments with
the Science subtree of dmoz.org (Sec. 11.5.3), despite the fact that the assignment of
instances to concepts was considerably different between the two ontologies, approx.
81% of instance pairs had h U = h V = 0 (and only 3.2% of these additionally had
l U = l V ). Thus, when using the definition of δ from eq. (11.3) (as opposed to the
overlap-based definition from eq. (11.4)), we must accept the fact that most of the
terms in the sum (11.1) will be 0 and OntoRand index will be close to 1. This does
not mean that the resulting values of OntoRand are not useful for assessing whether,
e.g., one ontology is closer to the gold standard than another ontology is, but it may
nevertheless appear confusing that OntoRand is always so close to 1. In this case a
possible alternative is to replace eq. (11.3) by
δ ( l, h )= e −αl tanh( βh +1)
(11.5)
The family of δ -functions defined by (11.5) can be seen as a generalization (in a loose
sense) of the δ -function from formula (11.4). For example, we compared the values
of δ produced by these two definitions on a set of 106 random pairs of documents
from the dmoz.org Science subtree. For a suitable choice of α and β , the definition
(11.5) can be made to produce values of δ that are very closely correlated with those
of definition (11.4) (e.g., correl. coe cient = 0 . 995 for α =0 . 15 =0 . 25). Similarly,
Search WWH ::




Custom Search