Using the Euclidean Distance for Retrieval Evaluation - Advances in Databases

Database Reference

In-Depth Information

Using the Euclidean Distance for

Retrieval Evaluation

Shengli Wu 1 ,YaxinBi 1 ,andXiaoqinZeng 2

1 School of Computing and Mathematics,

University of Ulster, Northern Ireland, UK

{ s.wu1,y.bi } @ulster.ac.uk

2 College of Computer and Information Engineering

Hehai University, Nanjing, China

xzeng@hhu.edu.cn

Abstract. In information retrieval systems and digital libraries, re-

trieval result evaluation is a very important aspect. Up to now, almost all

commonly used metrics such as average precision and recall level preci-

sion are ranking based metrics. In this work, we investigate if it is a good

option to use a score based method, the Euclidean distance, for retrieval

evaluation. Two variations of it are discussed: one uses the linear model

to estimate the relation between rank and relevance in resultant lists, and

the other uses a more sophisticated cubic regression model for this. Our

experiments with two groups of submitted results to TREC demonstrate

that the introduced new metrics have strong correlation with ranking

based metrics when we consider the average of all 50 queries. On the

other hand, our experiments also show that one of the variations (the

linear model) has better overall quality than all those ranking based met-

rics involved. Another surprising finding is that a commonly used metric,

average precision, may not be as good as previously thought.

1 The Euclidean Distance

In information retrieval, how to evaluate results is an important problem. A lot of

effort has been taken on this and some related issues. Many metrics for retrieval

effectiveness have been proposed. Average precision (AP), recall level precision

(RP), normalized discount cumulative gain (NDCG) [4], and average precision

at 10 document level (P10) are four of the most commonly used metrics. One

major characteristic of these metrics is: they only concern the ranking positions

of relevant/irrelevant documents. They are referred to as ranking based metrics

later in this paper.

In fact, apart from a ranked list of documents, some information retrieval

systems also provide relevance scores for all retrieved documents. For example,

for all those submitted runs to TREC 1 , most of them provide such score in-

formation. Suppose for a collection D of documents

{d 1 ,d 2 , ..., d n }

and a given

1 TREC stands for Text REtrieval Conference. It is an annual information retrieval

evaluation event held by the National Institute of Standards and Technology of the

USA. Its web site is located at http://trec.nist.gov/.

Search WWH ::

Custom Search

Home