Statistical Ranking Framework - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

q , and R(f ) can

D

assumed to be a random variable with probabilistic distribution

be defined as follows:

q (dx u ,dx v ,dy u,v )P Q (dq). (16.19)

R(f ) =

L(f ; x u ,x v ,y u,v ) D

Q

X

2

× Y

q

As both the distributions P

and

D

are unknown, the following empirical risk

Q

is used to estimate R(f ) :

m (i)

n

1

n

1

m (i)

x (i)

j 1 ,x (i)

j 2 ,y (i)

R(f )

=

L(f

;

j 1 ,j 2 ).

(16.20)

i

=

1

j

=

1

16.3.3 The Listwise Approach

Note that most existing listwise ranking algorithms assume that the listwise loss

function takes all the documents associated with a query as input, and there is no

sampling of these documents. Therefore, the two-layer ranking framework does not

explain the existing listwise ranking methods in a straightforward manner. Some

modifications need to be conducted to the algorithms in order to fit them into the

framework. For simplicity, we will not discuss the marriage between the two-layer

ranking framework and the listwise approach in this topic.

16.4 Summary

In this chapter, we have introduced three major statistical ranking frameworks used

in the literature. The document ranking framework assumes the i.i.d. distribution of

documents, regardless of the queries they belong to. The subset ranking framework

ignores the sampling of documents per query and directly assumes the i.i.d. distri-

bution of queries. The two-layer ranking framework considers the i.i.d. sampling of

both queries and documents per query. It is clear that the two-layer ranking frame-

work describes the real ranking problems in a more natural way. However, the other

two frameworks can also be used to obtain certain theoretical results that can explain

the behaviors of existing learning-to-rank methods. With the three frameworks, we

give the definitions of the empirical and expected risks for different approaches to

learning to rank. These definitions will be used intensively in the following two

chapters, which are concerned with the generalization ability and statistical consis-

tency of ranking methods.

16.5 Exercises

16.1 Compare the different probabilistic assumptions of the three ranking frame-

works.

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home