Geoscience Reference
In-Depth Information
2.4.4. Spatial and temporal indexing process flows: PIV prototype
In accordance with the specifications of Clough et al. [CLO 06], we deal
independently with each of the spatial and temporal dimensions and build specialized
indexes. This way, the monodimensional retrieval and the management of indexes
(addition of new documents in the corpus) remain efficient [MAR 05].
The spatial and temporal process flows are composed of three main processing
steps (Figure 2.6). The first step consists of the extraction of the SFs and TFs using a
lexico-syntactic process [GAI 08]. This process is supported by the
LINGUASTREAM [BIL 03, WID 05] platform. After a classic preliminary
segmentation processing of the contents (tokenization), it implements a form of
active parsing: a candidate token tagger identifies those corresponding, respectively,
to spatial and temporal information, by using typographical rules as well as lexical
resources. A morpho-syntactical analysis gathers tokens to constitute nominal groups
corresponding to candidate SFs and TFs (“torrent of Pau”, “Tuesday 21 June 2011”,
for example). These candidate features are composed of a proper noun or a number
for the SFs and TFs, respectively. Moreover, they have a more or less high reliability
score depending on the existence of an introducer referenced in the lexical resources.
Grammatically, these introducers belong to the category of prepositions of place (in,
on, under, next to, near, far, etc.), prepositions of time (of, in, since, during, etc.),
adverbs of place (in the proximity of, around, etc.), adverbs of time (around, after,
before, etc.) and adjectives of localization (back, central, north, higher, etc.).
Figure 2.6. The three main steps of indexing in the PIV prototype
Search WWH ::




Custom Search