by a larger public. It should be noted that this digitalization, keeping in mind the cost
of the operation, has been carried out by a provider without the correction of errors
and the recovery of the documents' structure, with the exception of their division into
Let us recall that this corpus is composed of documents of different types (literary
whichhavethecommondenominatorofdealingwiththePyrénéesterritoryinthe18 th
and 19 th Centuries. A preliminary study of the corpus has revealed a predominant
geographic connotation in the documents, as much in the literary studies dealing with
travelogues as in the local periodicals whose articles relate to information about the
territory. An experimentation has allowed us, for example, to extract almost 10,000
spatial named entities from 10 topics within the corpus (i.e. 600,000 terms).
Indeed, a large amount of information makes reference to places, spatial
indications as well as descriptions of landscape, temporal indicators and dates,
implying a significant importance of these documents for the geographic aspect. Let
us consider, as an example, travelogues (see the excerpt in Figure 1.1). The authors
of these pieces of study use, most of the time, an identical structure: the text is
divided into sections describing a portion of their travel. Each portion can consist of
the description of an itinerary, a stage, a point of view, an observation, an event, etc.
Figure 1.1. Document excerpt - The Travel to the Pyrénées, David James
Forbes, CAIRN Editions (1835)
Figure 1.1 represents two paragraphs from the travel journal of James David
Forbes. In it we can find toponyms such as, for example, the “pass of Torre” whose
