Analysis of the Approaches - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

y j }

ments associated with training query q , and the ground truth y

1 of these

documents in terms of multiple ordered categories, suppose a scoring function f is

used to rank these documents and the ranked list is π . A theory has been proven that

the NDCG-based ranking error can be bounded by the following regression loss:

η(j) 2

y j 2

f(x j )

Z m

−

NDCG (π, y )

≤

−

(5.1)

where Z m is the maximum DCG value and η(j) is the discount factor used in

NDCG.

In other words, if one can really minimize the regression loss to zero, one can

also minimize (1

NDCG) to zero. This seems to be a very nice property of the

regression-based methods.

With similar proof techniques to those used in [ 4 ], Li et al. [ 7 ] show that (1

−

NDCG) can also be bounded by the multi-class classification loss as shown below

(it is assumed that there are five ordered categories, i.e., K =

5 in the inequality):

η(j) m

Z m

−

NDCG (π, y ) ≤

η(j) 2

− m

I { y j = y j } ,

(5.2)

where

y j is the prediction on the label of x j by the multi-class classifier, and

= K − 1

f(x j )

k) .

In other words, if one can really minimize the classification loss to zero, one can

also minimize (1

y j =

NDCG) to zero at the same time.

However, on the other hand, please note that when (1

−

NDCG) is zero (i.e., the

documents are perfectly ranked), the regression loss and the classification loss might

not be zero (and can still be very large). That is, the minimization of the regression

loss and the classification loss is only a sufficient condition but not a necessary

condition for optimal ranking in terms of NDCG.

Let us have a close look at the classification bound in ( 5.2 ) with an example. 3

Note that a similar example has been given in [ 1 ] to show the problem of reducing

ranking to binary classification.

Suppose for a particular query q , we have four documents (i.e., m =

−

4) in total,

and their ground-truth labels are 4, 3, 2, and 1, respectively (i.e., y 1 =

4, y 2 =

y 3 =

2, y 4 =

1). We use the same discount factor and gain function as in [ 7 ]. Then

it is easy to compute that Z m = j = 1

log (j +

1 ) ( 2 y j − 1 )

21 . 35.

Then, suppose the outputs of the multi-class classifier are

≈

y 1 =

y 2 =

y 3 =

and

0, with 100% confidence in the prediction for each document. It is easy to

compute that ( 1

y 4 =

NDCG ) is 0 and we actually get a perfect ranking based on the

classifier. However, in terms of multi-class classification, we made errors in all four

−

3 One can get similar results for the regression bound given in inequality ( 5.1 ).

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home