Information Technology Reference
In-Depth Information
when only one level remains and h drops to 0 even for these pairs of instances;
thus δ doesn't increase when we move from two levels to 1: it drops to 0 instead,
causing the overall OntoRand similarity to grow again. This non-monotonicity could
be addressed by modifying the formula (11.3) somewhat, but it doesn't really have a
large practical impact anyway, as in a practical setting the ontology to be compared
to the gold standard would certainly have more than one level.
11.5.2 Swapping a Concept and Its Parent
This operation on trees is sometimes known as “rotation.” Consider a concept c
and its parent concept c . This operation replaces c and c so that c becomes the
child of c ; all other children of c , which were formerly the siblings of c , are now
its grandchildren; all the children of c , which were formerly the grandchildren of
c , are now its siblings. If c formerly had a parent c , then c is now the parent
of c , not of c . The result of this operation is a tree such as might be obtained by
an automated ontology construction algorithm that proceeds in a top-down fashion
and did not split the set of instances correctly (e.g., instead of splitting the set of
instances related to science into those related to physics, chemistry, biology, etc.,
and then splitting the “physics” cluster into mechanics, thermodynamics, nuclear
physics, etc., it might have split the “science” cluster into mechanics, thermodynam-
ics, nuclear physics, and “miscellaneous,” where the last group would later be split
into chemistry, biology, etc.). How does this operation affect the values of h and l
used in eqs. (11.2) and (11.3)? For two concepts that were originally both in the
subtree rooted by c , the value of h decreases by 1; if they were both in the subtree
of c but not in the subtree of c , the value of h increases by 1; if one was in the
subtree of c and the other outside the subtree of c , the value of l decreases by 1; if
one was in the subtree of c but not in the subtree of c , and the other was outside
the subtree of c , the value of l increases by 1; otherwise, nothing changes. The last
case includes in particular all those pairs of instances where none belonged to the
subtree rooted by c in the original ontology; this means the vast majority of pairs
(unless the subtree of c was very large). Thus the disagreement in the placement
of documents is usually quite small for an operation of this type, and OntoRand
is close to 1. This phenomenon is even more pronounced when using the similarity
measure based on tree distance (eq. 11.3) instead of the overlap measure (eq. 11.2).
Therefore, in the charts below (Figures 11.2 and 11.3), we show only the results for
the overlap measure and we show 1 OntoRand instead of OntoRand itself.
We performed 640 experiments with this operation, using one of the 640
third-level categories as the category c (e.g., replacing Top/Science/Physics and
Top/Science,etc.).
Figure 11.2 shows that the dissimilarity of the ontology after rotation to the
original ontology grows with the size of the parent subtree of c , while Figure 11.3
shows that this dissimilarity decreases with the size of c 's own subtree. This is
reasonable: the more instances there are in c 's subtree, the less different it is from
its parent, and the less the ontology has changed due to the rotation. For instance,
the topmost group of “ × ” symbols on both charts of Figure 11.3 corresponds to
experiments where c was one of the subcategories of the largest second-level category,
Top/World. As the right chart on Figure 11.3 shows, the dissimilarity is almost
linearly proportional to the difference in the size of the parent subtree and the
subtree rooted by c .
Search WWH ::




Custom Search