Database Reference
In-Depth Information
b n = |r n |
=1 g ( d i ) ,
i
b n− 1 = |r n + r n 1 |
g ( d i ) ...,
i =1
b = b 1 = |r n + r n 1 + r... + r 1 |
g ( d i )
i =1
AP can be defined as
total n
i
g ( d p j )
p i
AP = 1
b
g ( d p i )
i =1
j =1
Here p j is the ranking position of the j -th document whose grade is above 0, and
j =1 g ( d p j ) is the total sum of grades for documents up to rank p i . Considering
all these total n documents in the whole collection whose grades are above 0, AP
needs to calculate the precision at all these document levels ( p 1 , p 2 ,..., p total n ).
At any p i , precision is calculated as j =1 g ( d p j ) /p i , and then a weight of g ( d p i )
is applied. In this way the documents in higher grades have a bigger contribution
to the final value of AP.
For RP, First we only consider the top
|r n |
j
1
b n
|r n |
documents,
=1 g ( d j )can
be to evaluate their precision; next we consider the top
|r n |
+
|r n− 1 |
documents,
|r n | + |r n 1 |
1
b n 1
j =1 g ( d j ) can be used to evaluate their precision, continue this pro-
cess until finally we consider all top total n documents using 1
b
|r n | + ... + |r 1 |
j =1
g ( d j ).
Combining all these, we have
|r n | + |r n 1 |
|r n | + |r n 1 | + ... + |r 1 |
|r n |
j =1 g ( d j )+
1
n {
1
b n
1
b n− 1
1
b 1
RP =
g ( d ( j )+ ... +
g ( d ( j )
}
j =1
j =1
With binary relevance judgment or graded relevance judgement, all these defined
metrics are in the range of [0,1]. 0 is used to represent the least effective result
and 1 is used to represent the most effective result.
For the investigation, we use two groups of runs submitted to TREC: 9 and
2001 Web tracks. One major reason for choosing these two groups is because
three category relevance judgement is used for both; while in many other groups,
binary relevance judgement is commonly used. From all 105 runs submitted to
the TREC 9 Web track and 97 runs submitted to the TREC 2001 Web track,
we select those that include 1000 documents for each of the queries. Thus we
obtained 53 in TREC 9 and 34 runs in TREC 2001 2 . Removing those runs with
fewer documents provides us a homogeneous environment for the investigation,
and it should be helpful for us to obtain more reliable experimental results.
2 See the appendix for the list of runs selected in each group.
 
Search WWH ::




Custom Search