Spatial and Temporal Information Retrieval in Textual Corpora - Geographical Information Retrieval in Textual Corpora

Geoscience Reference

In-Depth Information

Recall evaluates the capacity of a system to select all the relevant documents in

the collection (are all the relevant documents selected?) while precision evaluates the

capacity of the system to select only relevant documents (are all the selected

documents relevant?). Other measures based on these two have been

proposed [SAN 10]. For instance, mean average precision (MAP) corresponds to the

average precision calculated on a given set of test queries.

The field of IR is characterized by a long history of evaluation [VOO 02]. A way

to evaluate the IRSs is based on the definition of a “campaign” that occurs in the

following way:

1) The organizers spread a call for participation, which presents the proposed IR

tasks.Forexample,anadhoctaskrequirestoretrievealistofrelevantdocumentsfora

givenquery.Incontrast,fora question answering task,theaimistoretrieveapieceof

informationansweringtoagivenquery.Forthequery“beachesofAnglet”,wewould

obtain a list of documents dealing with this subject for the ad hoc task, whereas we

would get the list of the beach names of Anglet for the question answering task.

2) The interested IRS designers register to the tasks of their choice. They are then

referred to as participants.

3) The organizers provide a corpus of documents and 25+ topics representing

information needs (i.e. detailed queries with description and narrative).

4) Theparticipantsprocessthecorpus,submitthetopicstotheirIRSandthenpass

theobtainedresults,alsoknownas runs,totheorganizers(e.g.pertopicdocumentlist

ranked by decreasing relevance).

5) The organizers constitute a set of relevant documents for each topic: the

relevance judgments. They then check participants' results against these relevance

judgments by means of predefined appropriate measures. The computed value

represents the effectiveness (i.e. measurement of result quality) of the IRS for the

considered topic. Aggregating all the scores obtained by the IRS for each of the 25+

topics (e.g. averaging over them) leads to an overall evaluation score for the IRS.

6) The organizers publish the results of the participants and generally make

available the test collection (i.e. corpus, topics and relevance judgments). This

collection can then be reused later in order to evaluate an IRS outside the campaign

framework.

As shown in Figure 2.4, T REC [VOO 05] is a reference campaign in IR allowing

us to evaluate IRSs with respect to the thematic dimension. S EM E VAL [AGI 07] and

S EM S EARCH [HAL 10] are, in particular, involved in the semantic analysis of textual

contents. There is not a lot of published work relative to the evaluation of the two

other dimensions of geographic information. The spatial and temporal dimensions

have been the object, respectively, of the evaluation framework C LEF ([PET 01], task

Search WWH ::

Custom Search

Home