Information Technology Reference
In-Depth Information
Fig. 11.2. Evaluation of ontologies where a concept c has been swapped with its
parent. The chart shows one symbol for each choice of c . The number of instances
in the parent subtree (the one rooted by c 's parent) is used as the x -coordinate, and
the dissimilarity after rotation is used as the y -coordinate.
As we can see, dissimilarity tends to grow approximately linearly with the size of
the parent subtree. The groups of symbols on the right represent experiments where
c was the child of one of the two largest second-level categories (Top/World and
Top/Regional).
Fig. 11.3. Evaluation of ontologies where a concept c has been swapped with its
parent. These charts explore the connection between dissimilarity and the number
of instances in c 's own subtree. Again each choice of c is represented by one sym-
bol (whose shape depends on the number of instances in the subtree rooted by c 's
parent). In the left chart, the x -coordinate is the number of instances in c 's own sub-
tree; in the right chart, the x -coordinate is the difference in the number of instances
between the parent's and c 's own subtree.
11.5.3 Reassignment of Instances to Concepts
In the dmoz ontology, each instance is really a short natural-language document
consisting of a web page title and description (usually 10-20 words). In this scenario,
we follow the standard practice from the field of information retrieval and represent
each document by a normalized TF-IDF vector. Based on these vectors, we compute
the centroid of each concept, i.e., the average of all documents that belong to this
concept or to any of its direct or indirect subconcepts. The cosine of the angle
between a document vector and a concept centroid vector is a measure of how
closely the topic of the document matches the topic of the concept (as defined by
Search WWH ::




Custom Search