Geoscience Reference
In-Depth Information
calculates the average of the scores of the spatial, temporal and thematic IRSs. The
GEOSEM [BIL 03], DIGMAP [MAR 07], PIV [GAI 08], GEOOREKA [BUS 09b]
and Local Search [BRI 10] systems implement linear combination.
A final approach to combination called “scattered ranking” [VAN 05] targets the
reorganization of the retrieved documents in order to increase their diversity. The aim
is to relegate a part of the documents with similar scores further in the ranking in
order to increase the variety of the results or to regroup them as proposed by Google.
The approach consists of spreading the documents (results) on a frame of n
dimensions, those closest to the origin of the frame are the most relevant. Then, the
points of the frame (results) are compared in pairs, and if a result is too close to
another it will be projected further away in the classification (frame). Kreveld
et al. [VAN 05] pick up this approach for the spatial and thematic dimensions,
whereas Purves et al. [PUR 07] test it on the SPIRIT system. This approach does not
allow us to modulate the combination (by favoring a criterion, for example), given
that ranking of results is recalculated dynamically.
3.3.2.4. Aggregation and multicriteria IR
In IR, Kelly and Fu [KEL 07] have shown a strong relationship between query
extension and performance. Similarly, in multicriteria IR, Croft and
Harabagiu [CRO 00] have shown the interest of approaches combining different
strategies of information representation and retrieval in textual content as techniques
improving the effectiveness of IR. They emphasize three categories of approach: the
combination of different representations of the corpus before the IR algorithms, the
combination of different IR algorithms and the combination of results from different
IR algorithms. The aggregation models based on the relevancy scores proposed by
Fox and Shaw [FOX 93], as well as by Miriam Fernandez et al. [FER 06], are found
in the third category. They propose a normalization of the scores before
implementing the aggregation phase. Farah and Vanderpooten [FAR 08] define result
aggregation as a process of ranking documents combining the scores (“retrieval
status values” or RSV) obtained for each search criterion. Figure 3.1 illustrates the
principles of multicriteria IR. A multicriteria query translates the user need. The
search engine analyzes the corpus in such a way as to find correspondences between
the contents of the documents and the search criteria. A list of results L j , containing
documents and their relevance scores, is created for each criterion Criterion j .Then,
the search engine aggregates these lists according to an aggregation function f and
produces the final result list L which is presented to the user.
Farah and Vanderpooten [FAR 08] define three classes of aggregation functions
applying the principles illustrated in Figure 3.1:
- Totally compensatory aggregation: Thisconsistsofcomputingascorefromaset
of scores rsv ji assigned to a document d i for each criterion of the query, by applying
Search WWH ::




Custom Search