On Using Social Context to Model Information Retrieval and Collaboration in Scientific Research Community - Community-Built Databases: Research and Development - page 147

Database Reference

In-Depth Information

To enrich this dataset, we gathered data about information consumers and social

interactions from the academic social network CITEULIKE . We collected all social

bookmarks targeting the SIGIR publications and we extracted related tags and

corresponding users.

The following paragraphs describe the dataset characteristics and evaluation

measures:

l Social network statistics . The SIGIR dataset includes 2,871 authors with an

average of 2 coauthorships and 16 citation links per author. As shown in

Table 6.1 , the citation relationships dominate the social network with nine

times as many as the coauthorship associations. In fact, the inclusion of citation

links restructures small and dispersed components into larger author commu-

nities. Consequently, the giant component connecting the majority of authors

nodes is enlarged with citation relationships to include 84% of authors as shown

is Fig. 6.3 .

l Queries and relevance assumption . Tags are user-generated keywords used to

annotate document content. They help a user to index a document from their

perspective and consequently correspond to a later information needs which may

possibly be satisfied with this document. Unlike automatic extracted terms form

textual context, tags seem to be more convenient to represent queries once both

of them are user-generated terms expressing information needs. Thus, we pro-

pose to choose tags assigned to the SIGIR publications as representative queries

in our experiments. We assume that the popular tags are more important in the

social context. Thus, we select as queries the most frequent tags assigned to the

SIGIR publications, then we build the ground truth through the following steps:

1. We select as initial queries the top 100 tags sorted by total bookmarks

targeting the SIGIR publications (popular tags).

2. We remove personal and empty tags such as “ to read ” and “ sigir .”

3. We regroup similar tags with different forms like “ language model ” and

“ language modeling .”

Table 6.1 Social network

properties of the SIGIR

dataset

Authors

2,871

Coauthorships

5,047

Citation links

45,880

Coauthorship and/or citation links

52,516

100%

Giant component

80%

Others

60%

A: Co-authorship network

C: Citation network

AC:

40%

Fig. 6.3 The giant

component of the SIGIR

social network

20%

Co-authorship and/or

citation network

0%

A

CAC

Next Page

Community-Built Databases: Research and Development

Search WWH ::

Custom Search

Home