Database Reference
In-Depth Information
4. For each query, we collect documents bookmarked at least once by the corres-
ponding tag or its similar forms.
5. From the previous list of documents corresponding to a query tag, we select
only the documents having the query tag among their three top assigned tags.
The final document set corresponds to the query relevant documents.
6. We remove the query tag if no relevant document is found.
We retain for experimentation the top 25 queries and their corresponding
relevant documents. The final collection includes 512 relevant documents with
an average of 20 relevant documents per query. To index the dataset, we used the
open-source library for information retrieval A PACHE L UCENE 7 which is based on
a modified scoring function of the vector-space model described in [ 54 ].
l Evaluation measures . In order to compare the social importance measures and
evaluate our model performance, we use recall and precision. Users are com-
monly interested in the top results; therefore, we study precision at 0.1 and 0.2
points of recall. With an average of 800 retrieved documents per query, these
recall points correspond to the first 160 documents.
6.5.3.2 Comparison of Social Importance Measures
The social importance measures highlight key entities in the social network and
include measures introduced by both domains of social network analysis [ 49 ] and
hyperlink analysis [ 46 , 47 ]. These measures have multiple semantics that vary from
one social application to another. In the context of scientific publications, the
Betweeness measure is considered as an indicator of interdisciplinarity and high-
lights authors connecting dispersed sectors of the scientific community. The Close-
ness measure, based on the shortest path in the graph, reflects the reachability and
independence of an author in his social neighborhood. The PageRank measure and
the Authority score computed by the HITS algorithm distinguish the authoritative
resources in the social network. By contrast, the Hub score computed by the HITS
algorithm identifies authors having an important social activity and relying on
authoritative resources, and these authors are called Centrals . We applied these
social importance measures on both a binary and a weighted model of the social
network. We note W-Betweeness , the application of Betweeness measure on the
weighted model of the social network. We use the same notation for the rest of the
social importance measures.
Table 6.2 presents comparative effectiveness results of the different importance
measures for both binary and weighted models of the social network. These results
are obtained using only the social importance score of documents by setting
0
in ( 6.8 ). We note that the Hub measure better ranks scientific papers for both binary
and weighted models of the social network. We conclude that the importance of
scientific publications can be estimated as the Centrality of their authors.
a ΒΌ
7 http://lucene.apache.org/
Search WWH ::




Custom Search