Geoscience Reference
In-Depth Information
knowledge bases, this qualitative approach consists of describing the spatial,
temporal and thematic information with terms with the same meaning (toponyms,
dates, events, etc.). It is a form of vocabulary extension through the use of specific
gazetteers, which brings the three geographic dimensions down to this qualitative
expansion. This reflection also targets challenge no. 7.
4.2.1. Intradimensional axis
We have developed a GIR system combining spatial, temporal and thematic IRSs.
However, the thematic part is at the moment limited to the full-text Terrier IRS
[OUN 05]. As with the spatial and temporal dimensions, we propose the
generalization of thematic information of the first-level index created by the Terrier
IRS. These perspectives are presented in section 4.2.1.1.
Concerning the spatial dimension, we have, in part, worked on representations of
spatial information approximated by the bounding boxes. We envision the study of
more accurate spatial representation modes in section 4.2.1.2.
4.2.1.1. Thematic dimension
We propose the use of a domain ontology: the concepts can be assimilated to the
tiles generated for the spatial or temporal dimensions (e.g. administrative or calendar
grids, respectively). Thus, the generalization of thematic information by tiling results
in the creation of a second-level index, which is more synthetic and has a larger range
than terms. It can be implemented by the following steps:
- Design of a thematic grid from a domain ontology: each concept corresponds to
a tile, and each hierarchical level of the ontology corresponds to a level of abstraction
(abotanicontologycomposedofaclassificationoftype kingdom, class, order, family,
etc.).
- Analysis of text and extraction of meaningful terms with a thematic IRS (a set
of lemmas extracted from the analyzed text: “gladiolus”, “bloom”, “summer”, etc.).
- Projection of the terms onto the thematic grid and calculation of the tile
frequencies (e.g. the “gladiolus” taxon of the botanic ontology retrieved among the
extracted elements of the text).
As illustrated in [ZGH 08], the indexes created in such a way can be used by
classic IR models such as the vectorial model. Moreover, in accordance with the
propositions described in [LEG 10], these indexes can also be used by semantic IR
models implementing similarity calculations between concepts such as the Wu and
Palmer measures [WU 94], those of Resnik [RES 95], Lin [LIN 98] as well as
Proxigenea [DUD 10].
Search WWH ::




Custom Search