Multicriteria Information Retrieval in Textual Corpora - Geographical Information Retrieval in Textual Corpora

Geoscience Reference

In-Depth Information

calculates the average of the scores of the spatial, temporal and thematic IRSs. The

GEOSEM [BIL 03], DIGMAP [MAR 07], PIV [GAI 08], GEOOREKA [BUS 09b]

and Local Search [BRI 10] systems implement linear combination.

A final approach to combination called “scattered ranking” [VAN 05] targets the

reorganization of the retrieved documents in order to increase their diversity. The aim

is to relegate a part of the documents with similar scores further in the ranking in

order to increase the variety of the results or to regroup them as proposed by Google.

The approach consists of spreading the documents (results) on a frame of n

dimensions, those closest to the origin of the frame are the most relevant. Then, the

points of the frame (results) are compared in pairs, and if a result is too close to

another it will be projected further away in the classification (frame). Kreveld

et al. [VAN 05] pick up this approach for the spatial and thematic dimensions,

whereas Purves et al. [PUR 07] test it on the SPIRIT system. This approach does not

allow us to modulate the combination (by favoring a criterion, for example), given

that ranking of results is recalculated dynamically.

3.3.2.4. Aggregation and multicriteria IR

In IR, Kelly and Fu [KEL 07] have shown a strong relationship between query

extension and performance. Similarly, in multicriteria IR, Croft and

Harabagiu [CRO 00] have shown the interest of approaches combining different

strategies of information representation and retrieval in textual content as techniques

improving the effectiveness of IR. They emphasize three categories of approach: the

combination of different representations of the corpus before the IR algorithms, the

combination of different IR algorithms and the combination of results from different

IR algorithms. The aggregation models based on the relevancy scores proposed by

Fox and Shaw [FOX 93], as well as by Miriam Fernandez et al. [FER 06], are found

in the third category. They propose a normalization of the scores before

implementing the aggregation phase. Farah and Vanderpooten [FAR 08] define result

aggregation as a process of ranking documents combining the scores (“retrieval

status values” or RSV) obtained for each search criterion. Figure 3.1 illustrates the

principles of multicriteria IR. A multicriteria query translates the user need. The

search engine analyzes the corpus in such a way as to find correspondences between

the contents of the documents and the search criteria. A list of results L j , containing

documents and their relevance scores, is created for each criterion Criterion j .Then,

the search engine aggregates these lists according to an aggregation function f and

produces the final result list L which is presented to the user.

Farah and Vanderpooten [FAR 08] define three classes of aggregation functions

applying the principles illustrated in Figure 3.1:

- Totally compensatory aggregation: Thisconsistsofcomputingascorefromaset

of scores rsv ji assigned to a document d i for each criterion of the query, by applying

Search WWH ::

Custom Search

Home