Statistical Ranking Framework - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

16.1 Document Ranking Framework

In the document ranking framework [ 1 - 3 , 5 , 7 , 10 ], only documents (together with

their labels) are considered as i.i.d. instances, and there is no similar consideration

on queries. With such a kind of assumption, for different approaches to learning to

rank, different kinds of risks can be defined.

16.1.1 The Pointwise Approach

For the pointwise approach, suppose that (x j ,y j ) are i.i.d. random variables accord-

ing to distribution P , where x

stands for

the ground-truth label of the document. Given the scoring function f , a loss occurs

if the prediction given by f is not in accordance with the given label. Here we use

L(f

∈ X

stands for the document and y

∈ Y

x j ,y j ) as a general representation of loss functions. It can be the pointwise 0-

1 loss, or the surrogate loss functions used by various pointwise ranking algorithms.

Given the loss function, the expected risk is defined as

;

=

R(f )

L(f

;

x j ,y j )P(dx j ,dy j ).

(16.1)

X × Y

Intuitively, the expected risk means the loss that a ranking model f would make

for a random document. Since it is almost impossible to compute the expected risk,

in practice, the empirical risk on the training set is used as an estimate of the ex-

pected risk. In particular, given the training data

m

j

{

(x j ,y j )

}

1 ,the empirical risk

=

can be defined as follows:

m

1

m

R(f ) =

L(f ; x j ,y j ).

(16.2)

j

=

1

16.1.2 The Pairwise Approach

In the pairwise approach, document pairs are learning instances. There are two

views on this approach in the document ranking framework, the first one which

we call the U-statistics View assumes that documents are i.i.d. random variables,

while the second one which we call the Average View assumes that document pairs

are i.i.d. random variables. Both views are valid in certain conditions. For exam-

ple, when the relevance degree of each document is used as the ground truth, the

U-statistics view is more reasonable. However, if the pairwise preferences between

documents are given as the ground-truth label, it might be more reasonable to take

the average view.

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home