Geoscience Reference
In-Depth Information
Access by Geographic Content to Textual
Corpora: What Orientations?
1.1. Introduction
The volume of digital corpora is always on the rise and the retrieval of relevant
documents is an increasingly delicate task. The ambiguity of natural language terms
contributes to this difficulty in the automatic interpretation of the expression of the
need for information as well as in the automatic evaluation of the correspondence
between documents and needs. The multiple meanings of the terms and their
numerous uses in varied contexts make delicate, indeed, the task of information
retrieval. Our working hypothesis therefore consists of distinguishing the spatial,
temporal and thematic dimensions in order to implement dedicated approaches in the
processes of indexing and information retrieval (IR). The objective is to contribute to
a better content analysis of textual corpora as well as to a better grasp of the search
criteria expressed in a query. Let us recall that we are studying textual corpora with
“territorial” denotations, digitized, to which processes of character recognition have
been applied but whose logical structure has not been conserved.
This chapter is organized as follows. Section 1.2 presents the general context
related to geographic information retrieval (GIR). Section 1.3 introduces privileged
fields of research as well as the position of our study. Section 1.4 gives a rough
sketch of our research approach in the construction of spatial, temporal and
multicriteria search engines.
1.2. Access by geographic content to textual corpora
The study concerning the processing of information in text is mainly detailed in
theses [BAZ 05, LES 07, PAL 10a, KER 11]. Following a number of reminders
related to document retrieval and textual corpora, we will describe the characteristics
