Information Aggregation in an Enterprise - Smart Information Systems: Computational Intelligence for Real-Life Applications

Information Technology Reference

In-Depth Information

different collections and re-rank them as a new single list. In this section we compare

different state-of-the-art unsupervised merging algorithms on experiments using the

FedWeb 2012 dataset [ 26 ]. We first introduce in Sect. 4.4.1 the four different merging

algorithms used in our experiments. Then, we present the results of these experiments

by calculating information retrieval metrics (precision, recall, normalized discounted

cumulative gain) resulted from these approaches with different retrieval setting in

Sect. 4.4.2 .

4.4.1 Algorithms

In previous years, various algorithms were introduced that merged result lists from

different indices. In the remainder of this section, we introduce the most common

techniques for unsupervised merging algorithms, namely CORI, Weighted MinMax,

and round robin. We also introduce and compare our result merging algorithm, naive

merger, with these algorithms.

4.4.1.1 CORI

CORI was introduced by Callan et al. [ 6 ], who suggested to calculate the relevance

of a collection as weight and to use this as a parameter value to recalculate each

document score. It is important to note that CORI can also be used to rank collections

that we do not do in our study. Let R be the notation for collection (from which a

document is retrieved), d be a retrieved document, and q be defined as notation for

an incoming query, then CORI re-rank calculation is defined as:

1

+

0

.

4

.

s MinMax (

R

|

q

)

s norm (

d

|

q

) =

·

s MinMax (

d

|

q

)

(4.1)

1

.

4

The value 0.4 is proposed by the authors as the default value to define how important

the collection weight should be. CORI is considered to be a state-of-the-art algorithm

since experiments indicate that it is a robust unsupervised linear score normalization

method.

4.4.1.2 Weighted MinMax

Markov et al. [ 22 ] proposed a modification of the CORI algorithm, referred to as

weighted MinMax. In this paper the authors replace the constant 0.4, which repre-

sents the importance of a collection, with variable Lambda. The authors investigated

how result merging performance for CORI is influenced by varying the Lambda-

parameter. The authors concluded that by setting the Lambda-parameter to infin-

ity (

λ ₒ∞

) they can outperform other unsupervised linear score normalization

Smart Information Systems: Computational Intelligence for Real-Life Applications

Search WWH ::

Custom Search

Home