Database Reference
In-Depth Information
Using the Euclidean Distance for
Retrieval Evaluation
Shengli Wu 1 ,YaxinBi 1 ,andXiaoqinZeng 2
1 School of Computing and Mathematics,
University of Ulster, Northern Ireland, UK
{ s.wu1,y.bi } @ulster.ac.uk
2 College of Computer and Information Engineering
Hehai University, Nanjing, China
xzeng@hhu.edu.cn
Abstract. In information retrieval systems and digital libraries, re-
trieval result evaluation is a very important aspect. Up to now, almost all
commonly used metrics such as average precision and recall level preci-
sion are ranking based metrics. In this work, we investigate if it is a good
option to use a score based method, the Euclidean distance, for retrieval
evaluation. Two variations of it are discussed: one uses the linear model
to estimate the relation between rank and relevance in resultant lists, and
the other uses a more sophisticated cubic regression model for this. Our
experiments with two groups of submitted results to TREC demonstrate
that the introduced new metrics have strong correlation with ranking
based metrics when we consider the average of all 50 queries. On the
other hand, our experiments also show that one of the variations (the
linear model) has better overall quality than all those ranking based met-
rics involved. Another surprising finding is that a commonly used metric,
average precision, may not be as good as previously thought.
1 The Euclidean Distance
In information retrieval, how to evaluate results is an important problem. A lot of
effort has been taken on this and some related issues. Many metrics for retrieval
effectiveness have been proposed. Average precision (AP), recall level precision
(RP), normalized discount cumulative gain (NDCG) [4], and average precision
at 10 document level (P10) are four of the most commonly used metrics. One
major characteristic of these metrics is: they only concern the ranking positions
of relevant/irrelevant documents. They are referred to as ranking based metrics
later in this paper.
In fact, apart from a ranked list of documents, some information retrieval
systems also provide relevance scores for all retrieved documents. For example,
for all those submitted runs to TREC 1 , most of them provide such score in-
formation. Suppose for a collection D of documents
{d 1 ,d 2 , ..., d n }
and a given
1 TREC stands for Text REtrieval Conference. It is an annual information retrieval
evaluation event held by the National Institute of Standards and Technology of the
USA. Its web site is located at http://trec.nist.gov/.
 
Search WWH ::




Custom Search