Information Technology Reference
In-Depth Information
7 Experiments
7.1 Datasets and Assumptions
Our experiments used four real datasets: Del.icio.us (a social bookmarking web
service), IMDb (the Internet Movie Database), LibraryThing (for tagging topics)
and CiteULike (a service for storing, organizing, and sharing scholarly papers).
Table 1 shows the characteristics of the datasets. User information is not avail-
able in the IMDb dataset, so competing algorithms were unable to create on-
tologies from it.
Table 1. Corpus Details in Some Collaborative Tagging Systems
Del.icio.us CiteULike IMDb LibraryThing
(Dec. 2007) (Jan. 2010) (Nov. 2009) (corpus from Delft )
Number of Tags
6,933,179
431,160
2,593,747
10,469
Number of Items
54,401,067
2,081,799
356,162
37,232
Number of Users
978,979
60,220
N/A
7,279
Number of Tag Assignments 450,113,886 7,922,454
2,625,237
2,415,517
http://homepage.tudelft.nl/5q88p/LT
To show that general purpose ontologies are insucient, we validated that
WordNet misses many relationships between terms even when it contains both
terms . To show this, we evaluated a sample ontology (from Del.icio.us) both
manually and by using all parent-child senses (meanings) in WordNet. We lim-
ited our experiments to relationships where both parent and child term exist in
WordNet. This gives WordNet an advantage since many tags do not appear in
WordNet at all. In this case, we found WordNet is missing 26.9% of manually
validated relationships discovered by ONTECTAS. For example, WordNet con-
tains 3 senses for “python”, but none of these senses is related to programming;
as a result, “programming
python” is missing in WordNet.
Since our approach is successful, it is clear that our hypothesis that agroup
of users tend to tag items with both parent and child tags is validated. The
full version of this paper [20] shows detailed experiments which validate this
empirically. We discuss our results, beginning with
has-a
relationship detection.
7.2 Evaluation of ONTECTAS in Detecting has-a Relationships
Table 2 shows the precision of ONTECTAS in detecting
has-a
relationships.
None of the other competing algorithms address
relationships from CTSs.
Table 2 only reports precision for ONTECTAS, the first algorithm to detect
has-a
has-a
from CTS data.
One challenge in detecting
relationships was that pattern-based search
engine queries such as “human's middle” and “middle of human” are frequently
part of phrases such as “human's middle finger” and “middle of human history”.
Clearly, there is room for improvement in ONTECTAS' precision in
has-a
has-a
de-
tection, which we plan to address in future work.
Search WWH ::




Custom Search