extraction and semantic analysis of such data in order to build the corresponding
annotations [LEP 07, GAI 08]. As an extension to this approach, the following
propositions target the enrichment of these symbolic annotations by numeric
representations (geometries and calendar periods), generally approximated, stored in
the indexes as entry points to noun phrases and corresponding document fragments.
They also cover the matching of document fragments with the user need expressed in
such an IR context [SAL 07a, LEP 07, SAL 08, SAL 09].
2.3. Spatial and temporal information in textual documents: literature review
We propose a summarized view of the work relative to modeling and reasoning
followed by linguistic processes in a context of spatial and temporal information
annotation, indexing and retrieval in textual documents.
2.3.1. Geographic information in text and IR
Several authors propose a molecular definition of geographic information
[USE 96, GAI 01, LON 05, LOU 08a]. According to these authors, geographic
information links a space, often a time and sometimes descriptive properties. They
use a metaphor from chemistry by underlining the atomic character of the spatial,
temporal and descriptive components of geographic information (see Figure 1.2). We
work with geographic information expressed in natural language. This information is
spread out among the lines, which makes its recognition and interpretation difficult
for the necessary numeric representation in the IR phase. The geographic information
must therefore be identified and converted into data allowing us, to take advantage of
their specificity. A process based on the recognition of spatial, then temporal, named
entities (NEs) followed by semantic analysis of the text allows the detection of some
of the spatial (or temporal) information of a document and its association with a
symbolic representation: for example, “to the south of Pau” is specified by a relation
of orientation applied to the municipality of Pau and “beginning of January 2010” is
represented by a relation of inclusion applied to the month of January 2010.
Nevertheless, in order to support the operations of IR, it is then necessary to calculate
a numeric representation (geometry and calendar period) corresponding to every
piece of information recognized and analyzed in this manner. Moreover, the pieces of
detected information can be subjective or dependent on the context in which they are
invokedandasaresult, theassociatednumericrepresentationsalwaysimplyacertain
2.3.2. Named entities
The notion of NE is widely associated with the establishment of the evaluation
campaigns of Message Understanding Conferences (MUC) systems [CHI 97]. These
