Spatial and Temporal Information Retrieval in Textual Corpora - Geographical Information Retrieval in Textual Corpora

Geoscience Reference

In-Depth Information

flow has detected 9,835 SFs. At the same time, in the manner of the CLEF 28

evaluation campaigns, a manual extraction and indexing of the SFs has been carried

out in order to build a set of reference SFs.

To evaluate the effectiveness of the indexing, we have chosen to use measures of

recallandprecision,onthebasisofamanualannotationof10pagesforeachofthe10

volumes. Evaluation reveals that the recall is equal to 0.49 and the precision is 0.73.

The low rate of recall can be explained, after analysis of the non-recognized SFs, by

the lack of geographic resources (for the validation and interpretation of these SFs),

then, with a lesser importance, by the lack of spatial indicators in the glossaries, the

orthographic variations and the problems in OCRization. The errors in precision are

mainlyduetohomonymsoroldnamesunavailableinourresources.Thesetestresults

are detailed in [LES 07], which gives an analysis of the results of each step in the

indexing process. The precision of the PIV SF extraction process flow is lower than

thatoftheSPIRITsystempresentedin[CLO 05]: thissystemfeaturesarecallof0.69

and a precision of 0.78, it indexes SFs of hotel, restaurant, street, postal code and

commune type and uses comprehensive corresponding geographic resources for the

validation phase.

The evaluation of the temporal indexing process flow [LEP 07] was carried out on

a smaller sample size composed of text extracted from these same topics. The PIV

process flow detected 540 TFs. Evaluation reveals that the recall is equal to 0.91 and

the precision is 0.97. These strong results need to be put into perspective: they can

be explained by the fact that the implemented grammars have been defined from the

preliminary study of a large part of the sample, which then served for the evaluation.

2.4.6.2. Evaluation of the PIV IRS

ExperimentsrelativetothespatialandtemporalIRsupportedbythePIVprototype

are described in [SAL 07a], [SAL 07b] and [PAL 10a].

The spatial IR experiment focuses on a sample of texts extracted from these same

topics, composed of 1,019 paragraphs, which corresponds to 1,028 SFs (902 ASFs

and 126 RSFs). The protocol contains 40 queries: 15 queries focus on ASFs (five

with small spatial range such as “pass”, five with intermediate range such as

“commune” and five of large range such as “region”); 25 focus on RSFs (five for

each type: adjacency, inclusion, orientation, distance and union). Three people have

conducted a pooling-type assessment phase described in [PAL 10a]. The evaluation

of these results of PIV spatial IR gives an MAP of 0.62 higher than that of the

SPIRIT GIR system, which is equal to 0.40 [PUR 07]. Let us note that the SPIRIT

system used 38 queries also referring to RSFs with relationships of adjacency,

28 Cross-Language Evaluation Forum (CLEF) - /http://www.clef-campaign.org/.

Search WWH ::

Custom Search

Home