Geoscience Reference
In-Depth Information
Regarding spatial NEs, we can quote the CasEN project [MAU 11] that defines
different typologies of toponyms used in finite state transducers for NER. Other
pieces of work [BUS 08, BOU 09] proceed to categorize toponyms after their
identification. The approach proposed by Bouamor [BOU 09] mainly takes
advantage of the structure of documents: for example, in the collaborative Wikipedia
encyclopedia, the identification of NEs is done in the title and their categorization is
based on the analysis of the first sentence of the corresponding description or
category sections. The approach proposed by Buscaldi and Rosso [BUS 08], and
Buscaldi [BUS 09a] aims, in particular, at the disambiguation of recognized
Martins et al. [MAR 08b] and Alonso et al. [ALO 11] describe the difficulties as
of IR.
We have considered a hybrid approach that, like Buscaldi and Rosso [BUS 08]as
well as Bouamor [BOU 09], marks not only place names but also, as proposed by
Maurel et al. [MAU 11], finds, through the use of external resources (here
ontological), associated terms in order to analyze their geographic scope (e.g.
populated place, road, stream, park). We present the modeling languages of such
information in the following section.
2.3.3. Modeling languages
Markup Language (XML) that allows the structuring of information. Let us take the
example of the sentence “Henry Russell admired the vineyard of Jurançon in the late
summer of 1856” with the markup of NEs according to the ENAMEX standard 11
(proposed within the MUC campaigns in the 1990s), illustrated in Listing 2.1.
2admi redt hevi neyardof
4i nt hel at eummrof
Listing 2.1. Example of ENAMEX markup (“Henry Russell admired the vineyard of
Jurançon in the late summer of 1856”) Spatial information markup languages
There exist different modeling languages of spatial information, each with a
particular objective: the exchange of data in the case of Geography Markup
Search WWH ::

Custom Search