ONTECTAS: Bridging the Gap between Collaborative Tagging Systems and Structured Data - Advanced Information Systems Engineering - page 443

Information Technology Reference

In-Depth Information

7 Experiments

7.1 Datasets and Assumptions

Our experiments used four real datasets: Del.icio.us (a social bookmarking web

service), IMDb (the Internet Movie Database), LibraryThing (for tagging topics)

and CiteULike (a service for storing, organizing, and sharing scholarly papers).

Table 1 shows the characteristics of the datasets. User information is not avail-

able in the IMDb dataset, so competing algorithms were unable to create on-

tologies from it.

Table 1. Corpus Details in Some Collaborative Tagging Systems

Del.icio.us CiteULike IMDb LibraryThing

(Dec. 2007) (Jan. 2010) (Nov. 2009) (corpus from Delft ∗ )

Number of Tags

6,933,179

431,160

2,593,747

10,469

Number of Items

54,401,067

2,081,799

356,162

37,232

Number of Users

978,979

60,220

N/A

7,279

Number of Tag Assignments 450,113,886 7,922,454

2,625,237

2,415,517

∗ http://homepage.tudelft.nl/5q88p/LT

To show that general purpose ontologies are insucient, we validated that

WordNet misses many relationships between terms even when it contains both

terms . To show this, we evaluated a sample ontology (from Del.icio.us) both

manually and by using all parent-child senses (meanings) in WordNet. We lim-

ited our experiments to relationships where both parent and child term exist in

WordNet. This gives WordNet an advantage since many tags do not appear in

WordNet at all. In this case, we found WordNet is missing 26.9% of manually

validated relationships discovered by ONTECTAS. For example, WordNet con-

tains 3 senses for “python”, but none of these senses is related to programming;

as a result, “programming

python” is missing in WordNet.

Since our approach is successful, it is clear that our hypothesis that agroup

of users tend to tag items with both parent and child tags is validated. The

full version of this paper [20] shows detailed experiments which validate this

empirically. We discuss our results, beginning with

→

has-a

relationship detection.

7.2 Evaluation of ONTECTAS in Detecting has-a Relationships

Table 2 shows the precision of ONTECTAS in detecting

has-a

relationships.

None of the other competing algorithms address

relationships from CTSs.

Table 2 only reports precision for ONTECTAS, the first algorithm to detect

has-a

has-a

from CTS data.

One challenge in detecting

relationships was that pattern-based search

engine queries such as “human's middle” and “middle of human” are frequently

part of phrases such as “human's middle finger” and “middle of human history”.

Clearly, there is room for improvement in ONTECTAS' precision in

has-a

has-a

de-

tection, which we plan to address in future work.

Next Page

Advanced Information Systems Engineering

Search WWH ::

Custom Search

Home