Statistical Ranking Framework - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

(1) The U-statistics View In the U-statistics view, suppose that (x u ,y u ) and

(x v ,y v ) are i.i.d. random variables according to distribution P , where X

∈ X

stands

∈ Y

for the document and Y

stands for the ground-truth label of the document.

(x u ,y u ) and (x v ,y v ) construct a pair. Given the scoring function f , a loss occurs

if the documents are not ranked according to their ground truth labels. Suppose the

loss function is L(f ; x u ,x v ,y u,v ) , where y u,v =

1. Again this loss

function can represent pairwise 0-1 loss, or pairwise surrogate loss functions used

by different algorithms. Then the expected risk is defined as

I { y u y v } −

R(f ) =

( X × Y ) 2 L(f ; x u ,x v ,y u,v )P(dx u ,dy u )P(dx v ,dy v ).

(16.3)

The expected risk means the loss that a ranking model f would make for two

random documents. Again, it is almost impossible to compute the expected risk,

and the empirical risk on the training set is used as an estimate of the expected

risk. Given the training data, the empirical risk can be defined with the following

U-statistics:

m(m

R(f )

L(f

;

x u ,x v ,y u,v ).

(16.4)

−

1 )

u =

v = u +

Specifically, when the ground truth is given as a binary relevance degree, the

positive example is denoted as x +

according to P +

and the negative example is

m +

j =

denoted as x −

according to P − . Given the training data x + ={

x j }

and x − =

m −

x j }

1 (where m + and m − are the numbers of positive and negative examples

in the training data respectively), the expected risk and the empirical risk can be

refined as follows:

{

x + ,x − )P + (dx + )P − (dx − ),

R(f )

L(f

;

(16.5)

m +

m −

m + m −

R(f )

x + ,x − ).

L(f

;

(16.6)

u =

v =

In this case, we usually call the problem a bipartite ranking problem.

(2) The Average View The average view assumes the i.i.d. distribution of docu-

ment pairs. More specifically, with the average view, each document pair (x u ,x v )

is given a ground-truth label y u,v ∈ Y ={−

1 , 1

}

, where y u,v =

1 indicates that doc-

ument x u is more relevant than x v and y u,v =−

1 otherwise. Then (x u ,x v ,y u,v ) is

assumed to be a random variable with probabilistic distribution P , and the expected

risk can be defined as

R(f ) =

L(f ; x u ,x v ,y u,v )P (dx u ,dx v ,dy u,v ).

(16.7)

× Y

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home