users in their query. For instance, if we wish to find documents related to events
associated with the south of Pau, the search engine will target the terms “south” and
“Pau”. However, a document referring to “Jurançon”, which is a bordering commune
to that of Pau and situated to its south, should also be returned. Similarly, for
temporal information, if we wish to find documents describing events related to the
19 th Century, the search engine should not only return the documents which contain
“19 th Century” but also those which contain “1801”, “1802”, etc.
Finally, an experienced user interested in documents related to the “Pyrénées
mountains but not those of Gavarnie, in the 19 th Century, if possible, unrelated to
ascents” must be able to depict this type of information need and navigate in the set
of resulting text units (paragraphs). To satisfy such needs, the construction of precise
indexes adapted to each type of information (spatial, temporal and thematic) seems
necessary. The aim is thus to improve GIR by combining the results obtained from
devoted spatial and temporal processes as well as from classic IR strategies,
employed generally for thematic criteria.
1.3. Reinforcement of GIR by contributions from NLP, reasoning and
If we consider the association for computing machinery (ACM) classification 11 ,
our study is related to section H3 INFORMATION STORAGE AND RETRIEVAL and,
in particular, to subsections: H.3.1 Content Analysis and Indexing, H.3.3 Information
Search and Retrieval and H.3.7 Digital Libraries. It concerns IR and, in particular,
GIR in textual document repositories.
However, as we have already shown, our field of research is distinct from classic
IR on a large number of points. We are interested in stable textual document
repositories (a priori, no update of a given document of the repository) as well as
those which are homogeneous in their style of expression (such as travelogue, walk
itinerary and tourist guide). This particularity enables a thorough processing, on the
one hand, for back-office indexing and specific usage scenarios, on the other.
Concerning indexing, natural language processing (NLP) supports the targeted
extraction and analysis of spatial and temporal information, while qualitative
reasoning completes this analysis and supports the interpretation of this information
as well as that of associated relations. Thus, Figure 1.3 positions our study
concerning GIR at a cross-road between IR, NLP and qualitative reasoning. This can
involve specialized IR dedicated to vocabulary proper to the expression of space and
time. We propose an active parsing of the textual document, in other words a targeted
search of expected elements of information in the text in order to build the
corresponding spatial and temporal meaning of the speech.
11 ACM Computing Classification System - http://dl.acm.org.