Community Detection in Collaborative Tagging Systems - Community-Built Databases: Research and Development

Database Reference

In-Depth Information

5.4 Experimental Evaluation

In order to gain insights into the behavior of community detection in real-world

tagging systems, we conduct an empirical evaluation of the performance of two

community detection methods on three datasets coming from different tagging

applications, namely BibSonomy, Flickr, and Delicious. The first of the two

community detection methods under study is the well-known greedy modularity

maximization scheme presented by Clauset et al. [ 31 ] 8 and the second is the scheme

that we presented in Sect. 5.3 . We will use the abbreviations CNM and HCD

(standing for Hybrid Community Detection) to denote the two methods. The

three datasets that we used for our study are described in the following and basic

information on their size is presented in Table 5.2 .

BIBSONOMY-200K : BibSonomy is a social bookmarking and publication shar-

ing application focused on research literature. The BibSonomy dataset was made

available through the ECML PKDDDiscovery Challenge 2009. 9 We used the “Post-

Core” version of the dataset, which consists of a little more than 200,000 tag assign-

ments (triplets) and hence the label “200K” was used to form the dataset name.

FLICKR-1M : Flickr is a popular photo-sharing and organizing application on the

Web today, featuring billions of tagged images. For our experiments, we used a

focused subset of Flickr comprising approximately 120,000 images that were

located within the city of Barcelona (the images contained geolocation informa-

tion). In total, the number of tag assignments for this dataset approaches one

million.

DELICIOUS-7M : Delicious is a popular social bookmarking service that enables

users to manage and share their bookmark collections online. We made use of a

small snapshot of the Delicious bookmark collection corresponding to January

2006, comprising seven million tag assignments. This dataset is a subset of the

collection studied in [ 34 ].

Starting from each dataset, we built a resource-based tag co-occurrence graph as

described in Sect. 5.2.1 . The raw graph contained a large component and several

very small components and isolated nodes. For the experiments we used only the

large component of each graph, which accounts for more than 99% of the size of

the raw graph for all three datasets. Some basic statistics of the analyzed large

Table 5.2 Folksonomy datasets used for evaluation

Dataset

#triplets

U

R

T

BIBSONOMY-200K

234,403

1,185

64,119

12,216

FLICKR-1M

927,473

5,463

123,585

27,969

DELICIOUS-7M

7,501,032

112,950

1,332,796

251,352

8

We used the publicly available implementation of this algorithm, which we downloaded from

http://www.cs.unm.edu/~aaron/research/fastmodularity.htm

9 http://www.kde.cs.uni-kassel.de/ws/dc09

Community-Built Databases: Research and Development

Search WWH ::

Custom Search

Home