Statistical Consistency for Ranking - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

18.3.1 Regarding DCG-Based Ranking Error

In [ 6 ], the consistency issue with respect to DCG-based ranking error is studied. In

particular, the study is focused on a specific regression-based ranking method. The

loss function minimized by the method has the following form:

w j f(x j ) − y j 2

w j max 0 , f(x j ) − δ 2 .

L φ (f, x , y ) =

+ u sup

(18.5)

The weight function w j is chosen so that it focuses only on the most important

examples (e.g., those examples whose y j =

1) while w j

focuses on the examples

not covered by w j (e.g., those examples whose y j =

0). This loss function basically

requires that the output of the ranking function f on a relevant document should

be as close to 1 as possible, and at the same time requires that the output on an

irrelevant document should not be larger than a small positive value δ by much.

As for the minimization of the above loss function, the following result has been

obtained.

Theorem 18.2 For all scoring functions f , let r f =

f be the induced ranking

model . Let f B be the Bayes optimal scoring function , i . e ., f B (x j ) = y j , and denote

the corresponding ranking model as r B =

sort

◦

f B . Suppose f ∗

sort

◦

is the scoring

function learned by minimizing E

, then in certain conditions ( see [ 6 ]

for more details ) the following inequality holds :

E DCG(r B , x , y ) −

[

L φ (f

;

x , y )

]

E DCG(r f , x , y )

C E L φ (f

x , y ) −

E L φ (f ∗ ;

x , y ) 1 / 2 ,

≤

;

(18.6)

where C is a constant .

The above theorem basically says that in certain conditions, the minimization of

the expectation of the regression loss given in ( 18.5 ) will lead to the maximization of

the expected DCG value. In other words, the regression loss in ( 18.5 ) is statistically

consistent with DCG-based ranking error.

18.3.2 Regarding Permutation-Level 0-1 Loss

In [ 13 ], the consistency of listwise ranking methods is investigated, with respect to

the permutation-level 0-1 loss. The permutation-level 0-1 loss takes a value of 0 if

the ranked list given by the ranking function is exactly the same as the ground-truth

permutation; and takes a value of 1 otherwise.

In particular, two conditions for the consistency are given in [ 13 ]. One is for the

underlying probability space, called order preserving . The other is for the surrogate

loss function, called order sensitive . The intuitive explanation of these two concepts

are as follows.

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home