Database Reference
In-Depth Information
To enrich this dataset, we gathered data about information consumers and social
interactions from the academic social network CITEULIKE . We collected all social
bookmarks targeting the SIGIR publications and we extracted related tags and
corresponding users.
The following paragraphs describe the dataset characteristics and evaluation
measures:
l Social network statistics . The SIGIR dataset includes 2,871 authors with an
average of 2 coauthorships and 16 citation links per author. As shown in
Table 6.1 , the citation relationships dominate the social network with nine
times as many as the coauthorship associations. In fact, the inclusion of citation
links restructures small and dispersed components into larger author commu-
nities. Consequently, the giant component connecting the majority of authors
nodes is enlarged with citation relationships to include 84% of authors as shown
is Fig. 6.3 .
l Queries and relevance assumption . Tags are user-generated keywords used to
annotate document content. They help a user to index a document from their
perspective and consequently correspond to a later information needs which may
possibly be satisfied with this document. Unlike automatic extracted terms form
textual context, tags seem to be more convenient to represent queries once both
of them are user-generated terms expressing information needs. Thus, we pro-
pose to choose tags assigned to the SIGIR publications as representative queries
in our experiments. We assume that the popular tags are more important in the
social context. Thus, we select as queries the most frequent tags assigned to the
SIGIR publications, then we build the ground truth through the following steps:
1. We select as initial queries the top 100 tags sorted by total bookmarks
targeting the SIGIR publications (popular tags).
2. We remove personal and empty tags such as “ to read ” and “ sigir .”
3. We regroup similar tags with different forms like “ language model ” and
language modeling .”
Table 6.1 Social network
properties of the SIGIR
dataset
Authors
2,871
Coauthorships
5,047
Citation links
45,880
Coauthorship and/or citation links
52,516
100%
Giant component
80%
Others
60%
A: Co-authorship network
C: Citation network
AC:
40%
Fig. 6.3 The giant
component of the SIGIR
social network
20%
Co-authorship and/or
citation network
0%
A
CAC
Search WWH ::




Custom Search