Statistical Ranking Framework - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

Chapter 16

Statistical Ranking Framework

Abstract In this chapter, we introduce the statistical ranking framework. In order

to analyze the theoretical properties of learning-to-rank methods, the very first step

is to establish the right probabilistic context for the analysis. This is just what the

statistical ranking framework addresses. In this chapter we will show three ranking

frameworks used in the literature of learning to rank, i.e., the document ranking

framework, the subset ranking framework, and the two-layer ranking framework.

The discussions in this chapter set the stage for further discussions on generalization

ability and statistical consistency in the following chapters.

As mentioned in the previous chapter, to facilitate the discussions on the general-

ization ability and statistical consistency of learning-to-rank algorithms, a statistical

ranking framework is needed. The framework basically describes how the data sam-

ples are generated, and how the empirical and expected risks are defined.

In the literature of learning to rank, three different statistical ranking frameworks

have been used, which we call the document ranking, subset ranking, and two-layer

ranking frameworks, respectively. Even for the same data, these frameworks try to

give different probabilistic interpretations of its generation process. For example,

the document ranking framework regards all the documents (no matter whether they

are associated with the same query or not) as i.i.d. sampled from a document space;

the subset ranking framework instead assumes the queries are i.i.d. sampled from

the query space, and each query is associated with a deterministic set of documents;

the two-layer ranking framework assumes i.i.d. sampling for both queries and doc-

uments associated with the same query. Also, these frameworks define risks in dif-

ferent manners. For example, the document ranking framework defines expected

risks by taking integration overall all documents; the subset ranking framework de-

fines expected risks by taking integration overall all queries, and the two-layer rank-

ing framework defines expected risks by taking integration overall both queries and

documents.

Search WWH ::

Custom Search

Home