Database Reference
In-Depth Information
b
n
=
|r
n
|
=1
g
(
d
i
)
,
i
b
n−
1
=
|r
n
+
r
n
−
1
|
g
(
d
i
)
...,
i
=1
b
=
b
1
=
|r
n
+
r
n
−
1
+
r...
+
r
1
|
g
(
d
i
)
i
=1
AP can be defined as
total n
i
g
(
d
p
j
)
p
i
AP
=
1
b
g
(
d
p
i
)
i
=1
j
=1
Here
p
j
is the ranking position of the
j
-th document whose grade is above 0, and
j
=1
g
(
d
p
j
) is the total sum of grades for documents up to rank
p
i
. Considering
all these
total n
documents in the whole collection whose grades are above 0,
AP
needs to calculate the precision at all these document levels (
p
1
,
p
2
,...,
p
total n
).
At any
p
i
, precision is calculated as
j
=1
g
(
d
p
j
)
/p
i
, and then a weight of
g
(
d
p
i
)
is applied. In this way the documents in higher grades have a bigger contribution
to the final value of AP.
For RP, First we only consider the top
|r
n
|
j
1
b
n
|r
n
|
documents,
=1
g
(
d
j
)can
be to evaluate their precision; next we consider the top
|r
n
|
+
|r
n−
1
|
documents,
|r
n
|
+
|r
n
−
1
|
1
b
n
−
1
j
=1
g
(
d
j
) can be used to evaluate their precision, continue this pro-
cess until finally we consider all top
total n
documents using
1
b
|r
n
|
+
...
+
|r
1
|
j
=1
g
(
d
j
).
Combining all these, we have
|r
n
|
+
|r
n
−
1
|
|r
n
|
+
|r
n
−
1
|
+
...
+
|r
1
|
|r
n
|
j
=1
g
(
d
j
)+
1
n
{
1
b
n
1
b
n−
1
1
b
1
RP
=
g
(
d
(
j
)+
...
+
g
(
d
(
j
)
}
j
=1
j
=1
With binary relevance judgment or graded relevance judgement, all these defined
metrics are in the range of [0,1]. 0 is used to represent the least effective result
and 1 is used to represent the most effective result.
For the investigation, we use two groups of runs submitted to TREC: 9 and
2001 Web tracks. One major reason for choosing these two groups is because
three category relevance judgement is used for both; while in many other groups,
binary relevance judgement is commonly used. From all 105 runs submitted to
the TREC 9 Web track and 97 runs submitted to the TREC 2001 Web track,
we select those that include 1000 documents for each of the queries. Thus we
obtained 53 in TREC 9 and 34 runs in TREC 2001
2
. Removing those runs with
fewer documents provides us a homogeneous environment for the investigation,
and it should be helpful for us to obtain more reliable experimental results.
2
See the appendix for the list of runs selected in each group.