On Using Social Context to Model Information Retrieval and Collaboration in Scientific Research Community - Community-Built Databases: Research and Development

Database Reference

In-Depth Information

4. For each query, we collect documents bookmarked at least once by the corres-

ponding tag or its similar forms.

5. From the previous list of documents corresponding to a query tag, we select

only the documents having the query tag among their three top assigned tags.

The final document set corresponds to the query relevant documents.

6. We remove the query tag if no relevant document is found.

We retain for experimentation the top 25 queries and their corresponding

relevant documents. The final collection includes 512 relevant documents with

an average of 20 relevant documents per query. To index the dataset, we used the

open-source library for information retrieval A PACHE L UCENE 7 which is based on

a modified scoring function of the vector-space model described in [ 54 ].

l Evaluation measures . In order to compare the social importance measures and

evaluate our model performance, we use recall and precision. Users are com-

monly interested in the top results; therefore, we study precision at 0.1 and 0.2

points of recall. With an average of 800 retrieved documents per query, these

recall points correspond to the first 160 documents.

6.5.3.2 Comparison of Social Importance Measures

The social importance measures highlight key entities in the social network and

include measures introduced by both domains of social network analysis [ 49 ] and

hyperlink analysis [ 46 , 47 ]. These measures have multiple semantics that vary from

one social application to another. In the context of scientific publications, the

Betweeness measure is considered as an indicator of interdisciplinarity and high-

lights authors connecting dispersed sectors of the scientific community. The Close-

ness measure, based on the shortest path in the graph, reflects the reachability and

independence of an author in his social neighborhood. The PageRank measure and

the Authority score computed by the HITS algorithm distinguish the authoritative

resources in the social network. By contrast, the Hub score computed by the HITS

algorithm identifies authors having an important social activity and relying on

authoritative resources, and these authors are called Centrals . We applied these

social importance measures on both a binary and a weighted model of the social

network. We note W-Betweeness , the application of Betweeness measure on the

weighted model of the social network. We use the same notation for the rest of the

social importance measures.

Table 6.2 presents comparative effectiveness results of the different importance

measures for both binary and weighted models of the social network. These results

are obtained using only the social importance score of documents by setting

0

in ( 6.8 ). We note that the Hub measure better ranks scientific papers for both binary

and weighted models of the social network. We conclude that the importance of

scientific publications can be estimated as the Centrality of their authors.

a ¼

7 http://lucene.apache.org/

Search WWH ::

Custom Search

Home