Access by Geographic Content to Textual Corpora: What Orientations? - Geographical Information Retrieval in Textual Corpora

Geoscience Reference

In-Depth Information

users in their query. For instance, if we wish to find documents related to events

associated with the south of Pau, the search engine will target the terms “south” and

“Pau”. However, a document referring to “Jurançon”, which is a bordering commune

to that of Pau and situated to its south, should also be returned. Similarly, for

temporal information, if we wish to find documents describing events related to the

19 th Century, the search engine should not only return the documents which contain

“19 th Century” but also those which contain “1801”, “1802”, etc.

Finally, an experienced user interested in documents related to the “Pyrénées

mountains but not those of Gavarnie, in the 19 th Century, if possible, unrelated to

ascents” must be able to depict this type of information need and navigate in the set

of resulting text units (paragraphs). To satisfy such needs, the construction of precise

indexes adapted to each type of information (spatial, temporal and thematic) seems

necessary. The aim is thus to improve GIR by combining the results obtained from

devoted spatial and temporal processes as well as from classic IR strategies,

employed generally for thematic criteria.

1.3. Reinforcement of GIR by contributions from NLP, reasoning and

multicriteria IR

If we consider the association for computing machinery (ACM) classification 11 ,

our study is related to section H3 INFORMATION STORAGE AND RETRIEVAL and,

in particular, to subsections: H.3.1 Content Analysis and Indexing, H.3.3 Information

Search and Retrieval and H.3.7 Digital Libraries. It concerns IR and, in particular,

GIR in textual document repositories.

However, as we have already shown, our field of research is distinct from classic

IR on a large number of points. We are interested in stable textual document

repositories (a priori, no update of a given document of the repository) as well as

those which are homogeneous in their style of expression (such as travelogue, walk

itinerary and tourist guide). This particularity enables a thorough processing, on the

one hand, for back-office indexing and specific usage scenarios, on the other.

Concerning indexing, natural language processing (NLP) supports the targeted

extraction and analysis of spatial and temporal information, while qualitative

reasoning completes this analysis and supports the interpretation of this information

as well as that of associated relations. Thus, Figure 1.3 positions our study

concerning GIR at a cross-road between IR, NLP and qualitative reasoning. This can

involve specialized IR dedicated to vocabulary proper to the expression of space and

time. We propose an active parsing of the textual document, in other words a targeted

search of expected elements of information in the text in order to build the

corresponding spatial and temporal meaning of the speech.

11 ACM Computing Classification System - http://dl.acm.org.

Search WWH ::

Custom Search

Home