Geoscience Reference
In-Depth Information
Terrier [OUN 05], as reported in [PER 08]. The existing GIR works (presented in
Table 2.1 of Chapter 2) have also been evaluated from the point of view of the size of
the indexes, the time to construct the indexes and the query times [VAI 05]. These
evaluations would only benefit from being put into perspective with other measures
such as the precision or the recall of the considered GIRSs. It is therefore impossible
to compare the search engines driven to process simultaneously the three geographic
dimensions. Therefore, in order to respond to this need, we propose an experimental
framework dedicated to GIRSs. This experimental framework is focused on
emphasizing the existing know-how of campaigns, such as T REC and G EO C LEF ,
while integrating the specificities relative to geographic information.
We then implement this framework for the evaluation of our IR prototypes PIV 2
and PIV 3 based on the information indexing and retrieval by tiling.
3.5.1. Evaluation framework of geographic IRSs: proposal for a test collection and
an experimental protocol
In addition to the description of evaluation campaigns presented in Table 2.1 in
section 2.3.6, we recall that a test collection contains the following:
1) A set of n “topics” representing users' information needs. Each topic is at least
characterized by a title (a keyword-based query), a description (usually a sentence
in natural language) and a narrative (a detailed explanation of expected information
as well as criteria for judging a document as relevant or non-relevant). Buckley and
Voorhees [BUC 00] show that at least 25 topics are necessary to perform statistically
significant analyzes. Let us note, however, that the standard of T REC is equal to 50
topics.
2) The“corpus”ofdocuments,someofwhicharerelevantfortheproposed topics.
Aregular T REC corpusforaclassicadhoctaskismadeupof800,000documentsand
more [VOO 05].
3) The “qrels”, T REC term denoting the query relevance judgments, associating
each topic with the set of relevant documents. Because the corpus is too large to be
analyzedexhaustively,IRevaluationframeworksrelyonthe pooling technique.Thus,
foreach topic,apool ofdocumentsiscreatedfromthetop100documentsretrievedby
the participants' IRSs, duplicates being removed. The hypothesis is that the number
and the diversity of the IRSs contributing to the pool will allow us to find most of the
relevant documents. Finally, a human assessor examines each document of the pool
in order to evaluate whether or not it matches the information need specified in the
considered topic. The document is then qualified as relevant or non-relevant.
Such test collections have been developed several times in evaluation frameworks
such as T REC and G EO C LEF .Let us note that these do not take into account the three
Search WWH ::




Custom Search