Database Reference
In-Depth Information
5.4 Experimental Evaluation
In order to gain insights into the behavior of community detection in real-world
tagging systems, we conduct an empirical evaluation of the performance of two
community detection methods on three datasets coming from different tagging
applications, namely BibSonomy, Flickr, and Delicious. The first of the two
community detection methods under study is the well-known greedy modularity
maximization scheme presented by Clauset et al. [ 31 ] 8 and the second is the scheme
that we presented in Sect. 5.3 . We will use the abbreviations CNM and HCD
(standing for Hybrid Community Detection) to denote the two methods. The
three datasets that we used for our study are described in the following and basic
information on their size is presented in Table 5.2 .
BIBSONOMY-200K : BibSonomy is a social bookmarking and publication shar-
ing application focused on research literature. The BibSonomy dataset was made
available through the ECML PKDDDiscovery Challenge 2009. 9 We used the “Post-
Core” version of the dataset, which consists of a little more than 200,000 tag assign-
ments (triplets) and hence the label “200K” was used to form the dataset name.
FLICKR-1M : Flickr is a popular photo-sharing and organizing application on the
Web today, featuring billions of tagged images. For our experiments, we used a
focused subset of Flickr comprising approximately 120,000 images that were
located within the city of Barcelona (the images contained geolocation informa-
tion). In total, the number of tag assignments for this dataset approaches one
million.
DELICIOUS-7M : Delicious is a popular social bookmarking service that enables
users to manage and share their bookmark collections online. We made use of a
small snapshot of the Delicious bookmark collection corresponding to January
2006, comprising seven million tag assignments. This dataset is a subset of the
collection studied in [ 34 ].
Starting from each dataset, we built a resource-based tag co-occurrence graph as
described in Sect. 5.2.1 . The raw graph contained a large component and several
very small components and isolated nodes. For the experiments we used only the
large component of each graph, which accounts for more than 99% of the size of
the raw graph for all three datasets. Some basic statistics of the analyzed large
Table 5.2 Folksonomy datasets used for evaluation
Dataset
#triplets
U
R
T
BIBSONOMY-200K
234,403
1,185
64,119
12,216
FLICKR-1M
927,473
5,463
123,585
27,969
DELICIOUS-7M
7,501,032
112,950
1,332,796
251,352
8
We used the publicly available implementation of this algorithm, which we downloaded from
http://www.cs.unm.edu/~aaron/research/fastmodularity.htm
9 http://www.kde.cs.uni-kassel.de/ws/dc09
Search WWH ::




Custom Search