Spatial and Temporal Information Retrieval in Textual Corpora - Geographical Information Retrieval in Textual Corpora

Geoscience Reference

In-Depth Information

2

Spatial and Temporal Information

Retrieval in Textual Corpora

2.1. Introduction

Information retrieval (IR) systems intended for the wider public do not offer

specific processing of spatial or temporal information contained within the corpora or

search criteria. Nevertheless, in numerous cases, these pieces of information could

play an important role in the calculation of the relevance of a document [TEI 11].

Consideration of the semantics of spatial and temporal expressions could enable a

finer processing of expressions such as “musical instruments in the vicinity of Laruns

at the beginning of the 19 th Century”. Most of the time in IR, however, documents

are processed from the viewpoint of their textual content as mere “bags” of

independent words. Moreover, beyond the textual content, document-specific

information could be taken into consideration such as the structuring in sections and

paragraphs, for example.

Our context relative to textual corpora with “territorial” denotations is specific.

On the one hand, spatial and temporal references are frequent and, on the other hand,

thedocument repositoriesare sufficientlystableand homogeneous towarrantspecific

back-office processing. Our work is thus different from classic IR since it aims a

thorough processing of content: specific process flows target the recognition

followed by the interpretation of spatial and temporal information. From a structural

point of view, the documents at our disposal are acquired from basic digitalization

efforts integrating only character and paragraph recognition. They are in text format

and are generally composed of several tens or hundreds of pages. This is the reason

why we believe the entry point in the corpus cannot be the document itself and we

propose working with paragraphs as document units.

As recommended by Clough et al. [CLO 06], we deal independently with spatial

and temporal dimensions: this way, the single-dimension IR and the operation for

Search WWH ::

Custom Search

Home