Information Technology Reference
In-Depth Information
m
j
={
y j }
ments associated with training query q , and the ground truth y
1 of these
documents in terms of multiple ordered categories, suppose a scoring function f is
used to rank these documents and the ranked list is π . A theory has been proven that
the NDCG-based ranking error can be bounded by the following regression loss:
=
2
η(j) 2
2
y j 2
1
1
2
m
m
f(x j )
1
Z m
1
NDCG (π, y )
,
(5.1)
j
=
1
j
=
1
where Z m is the maximum DCG value and η(j) is the discount factor used in
NDCG.
In other words, if one can really minimize the regression loss to zero, one can
also minimize (1
NDCG) to zero. This seems to be a very nice property of the
regression-based methods.
With similar proof techniques to those used in [ 4 ], Li et al. [ 7 ] show that (1
NDCG) can also be bounded by the multi-class classification loss as shown below
(it is assumed that there are five ordered categories, i.e., K =
5 in the inequality):
2
η(j) m
m
m
m
15
Z m
1
NDCG (π, y )
η(j) 2
m
·
I { y j = y j } ,
(5.2)
j
=
1
j
=
1
j
=
1
where
y j is the prediction on the label of x j by the multi-class classifier, and
ˆ
= K 1
k
f(x j )
k) .
In other words, if one can really minimize the classification loss to zero, one can
also minimize (1
k
·
P(
y j =
ˆ
=
0
NDCG) to zero at the same time.
However, on the other hand, please note that when (1
NDCG) is zero (i.e., the
documents are perfectly ranked), the regression loss and the classification loss might
not be zero (and can still be very large). That is, the minimization of the regression
loss and the classification loss is only a sufficient condition but not a necessary
condition for optimal ranking in terms of NDCG.
Let us have a close look at the classification bound in ( 5.2 ) with an example. 3
Note that a similar example has been given in [ 1 ] to show the problem of reducing
ranking to binary classification.
Suppose for a particular query q , we have four documents (i.e., m =
4) in total,
and their ground-truth labels are 4, 3, 2, and 1, respectively (i.e., y 1 =
4, y 2 =
3,
y 3 =
2, y 4 =
1). We use the same discount factor and gain function as in [ 7 ]. Then
it is easy to compute that Z m = j = 1
1
log (j +
1 ) ( 2 y j 1 )
21 . 35.
Then, suppose the outputs of the multi-class classifier are
y 1 =
ˆ
3,
y 2 =
ˆ
2,
y 3 =
ˆ
1,
and
0, with 100% confidence in the prediction for each document. It is easy to
compute that ( 1
y 4 =
ˆ
NDCG ) is 0 and we actually get a perfect ranking based on the
classifier. However, in terms of multi-class classification, we made errors in all four
3 One can get similar results for the regression bound given in inequality ( 5.1 ).     Search WWH ::

Custom Search