Geoscience Reference
In-Depth Information
3
Multicriteria Information Retrieval in
Textual Corpora
3.1. Introduction
Although regular search engines already retrieve good results for keyword-based
searches, Kanhabua and Nørvåg [KAN 08] observed that, on specific corpora, the
precision of geographic retrieval is weak. Indeed, users spend a lot of time exploring
theretrieveddocumentsinordertokeeponlythedocumentswhichsatisfytheirneeds.
For example, within a larger query, the temporal expression “the 1810s” submitted to
a classic search engine leads to the retrieval of documents containing “1810” and not
“1811”, “1812”, etc. Similarly, the spatial expression “around Anglet” submitted to a
classic search engine leads to the retrieval of results containing “Anglet” and not “the
gulf of Chiberta”, “the beach of Cavaliers”, “Bayonne”, “Biarritz”, etc.
A means of enhancing the efficiency of search engines is then to take into account
not only the thematic aspects, but also the spatial and temporal aspects. In Chapter 2,
we dealt with spatial and temporal dimensions as privileged entry points to the texts.
The PIV project we presented is mainly based on the development of process flows
dedicated to the indexing and retrieval of spatial and temporal information. A
prototype, corresponding to each process flow, extracts and indexes the information
from textual documents and proposes a search engine which, based on spatial
[GAI 08] or temporal [LEP 07] criteria, returns paragraphs of documents.
Our objective now consists of proposing a better approach combining standard IR
services with specific ones dedicated to the spatial IR and temporal IR. We propose a
unified model representing each of the three dimensions and aggregating the results
from different specialized IRSs which we implement in a new version of the PIV
prototype. To validate our multi-dimensional approach, we also propose an
evaluation framework for geographic IRSs (GIRSs). The proposed evaluation
 
Search WWH ::




Custom Search