Database Reference
In-Depth Information
query Q ,aresult R from an information retrieval system IR is
,
here s i is the score assigned to document d i by IR . On the other hand, for every
document d i , it has an ideal relevance score o i . If binary relevance judgment is
used, then the ideal score for any judged relevant document is 1 and that for
any judged irrelevant document is 0. If 3 graded relevance judgment is used,
then 0, 0.5, and 1 can be used as ideal scores of irrelevant, modestly relevant,
and highly relevant documents, respectively. In such a situation, the Euclidean
distance between the scores in R and the ideal scores O =
{s 1 ,s 2 , ..., s n }
{o 1 ,o 2 , ..., o n }
can be
calculated by
n
distance ( R, O )=
( s i − o i ) 2
(1)
i =1
distance ( R, O ) can be used as a metric to evaluate the effectiveness of R .Ifall
documents' scores are estimated accurately, then we can expect a low Euclidean
distance value; otherwise, we can expect a high one. The Euclidean distance has
been widely used as a metric in many areas such as data mining, neural net-
works, etc. However, to our knowledge, it has never been explored in information
retrieval. Different from those ranking based metrics, the Euclidean distance can
be regarded as a relevance score based metric. It is an interesting thing to find
out if the Euclidean distance is a good choice for the evaluation of information
retrieval results.
At first glance, one may think that the condition for using this metric is
very rigorous since relevance scores for all the documents in the whole collection
need to be provided. This is only true theoretically. In practice, we may use some
reasonable approximation methods. In TREC, only a certain number (say, 1000)
of documents are included in a result for a query. It is not known what the scores
are for those documents that do not occur in the result. Since those documents
are very likely irrelevant, we can reasonably assign a default score of 0 to all of
them. Another situation is: although some retrieval systems provide scores for
all retrieved documents, those scores may be in various ranges (say, one is in the
range of 1 to 1000 and another in the range of 0 to 10) and cannot be used directly
as relevance scores. In such a situation, we may use score normalization methods
to normalize all the scores to the desired range [0,1]. Several such methods have
been investigated before [5,6,11]. Finally, some retrieval systems may not provide
scores at all for the retrieved documents. Then we may assign different scores
to documents at different ranks in a systematic manner. One straightforward
method for this is the linear model: for a ranked list of m documents, the top
ranked document is given a score of 1, the 2nd document in the list is assigned
ascoreof( m−
1) /m ,..., the last document in the list is assigned a score of 1 /m .
This is the method used in Borda voting. It has been used for data fusion [1]
in information retrieval as well. Alternatively, we may use nonlinear models to
estimate scores for documents at different ranks. For example, we may use the
cubic model [9] or other models (e.g., the logistic model [3]) to do this. The cubic
model can be expressed by the following equation
 
Search WWH ::




Custom Search