Using the Euclidean Distance for Retrieval Evaluation - Advances in Databases

Database Reference

In-Depth Information

P10 is defined as the percentage of relevant documents in the top 10 documents

in R .

NDCG is introduced in [4]. Each ranking position in a resultant document list

is assigned a given weight. The top ranked documents are assigned the highest

weights since they are the most convenient ones for users to read. A logarithmic

function-based weighting schema was proposed in [4], which needs to take a

particular whole number c . The first c documents are assigned a weight of 1;

then for any document ranked i which is greater than c , its weight is w ( i )=

ln ( c ) /ln ( i ). Considering a resultant document list up to t documents, its discount

cumulated gain (DCG) is defined as

DCG =

( w ( i )

∗ r ( i ))

i =1

if the i -th document is relevant, then r ( i )=1;ifthe i -th document is irrelevant,

then r ( i )=0. DCG can be normalized using a normalization coecient DCG best ,

which is the DCG value of the best resultant lists. Therefore, we have:

DCG best

NDCG =

( w ( i )

∗ r ( i ))

i =1

Now let us see a way of extending AP, RP, and P10 for graded relevance judge-

ment [12]. Note that NDCG can be used in the condition of graded relevance

judgement directly, so no extension is needed for it.

Suppose there are n relevance grades ranging from 1 to n ( n means the most

relevant state and 0 means the irrelevant state), then each document d i can be

assigned a grade g ( d i ) according to its degree of relevance to the given query. One

primary assumption taken for these documents in various grades is: a document

in grade n is regarded as 100% relevant and 100% useful to users, and a document

in grade i ( i<n ) is regarded as i/n % relevant and i/n % useful to users. Suppose

there are total n documents whose grades are above 0 and total n =

|r 1 |

|r 2 |

... +

denotes the number of documents in grade i . First let us see

the concept of the best resultant list. For the given query Q , a resultant list L

is best if it satisfies the following two conditions:

|r n |

.Here

|r i |

* all the documents whose grades are above 0 appear in the list;

* for any document pair d i and d j ,if d i is ranked in front of d j ,then g ( d i )

≥

g ( d j ).

Manyresultantlistscanbethebestatthesametime,sincemorethanone

document can be in the same grade and the documents in the same grade can

be arranged in different orders, but the relative ranking positions of documents

in different grades cannot be changed. Therefore, we can use g best ( d j ) to refer

to the grade of the document in ranking position j in one of these best resultant

lists. We may also sum up the grades of the documents in top

|r n −

1 | ),... , top (( |r n | + |r n − 1 | +...+ |r 1 | ) for any of the best resultant lists (these sums

are the same for all the best resultant lists):

|r n |

,top(

Advances in Databases

Search WWH ::

Custom Search

Home