Information Technology Reference
In-Depth Information
Chapter 16
Statistical Ranking Framework
Abstract In this chapter, we introduce the statistical ranking framework. In order
to analyze the theoretical properties of learning-to-rank methods, the very first step
is to establish the right probabilistic context for the analysis. This is just what the
statistical ranking framework addresses. In this chapter we will show three ranking
frameworks used in the literature of learning to rank, i.e., the document ranking
framework, the subset ranking framework, and the two-layer ranking framework.
The discussions in this chapter set the stage for further discussions on generalization
ability and statistical consistency in the following chapters.
As mentioned in the previous chapter, to facilitate the discussions on the general-
ization ability and statistical consistency of learning-to-rank algorithms, a statistical
ranking framework is needed. The framework basically describes how the data sam-
ples are generated, and how the empirical and expected risks are defined.
In the literature of learning to rank, three different statistical ranking frameworks
have been used, which we call the document ranking, subset ranking, and two-layer
ranking frameworks, respectively. Even for the same data, these frameworks try to
give different probabilistic interpretations of its generation process. For example,
the document ranking framework regards all the documents (no matter whether they
are associated with the same query or not) as i.i.d. sampled from a document space;
the subset ranking framework instead assumes the queries are i.i.d. sampled from
the query space, and each query is associated with a deterministic set of documents;
the two-layer ranking framework assumes i.i.d. sampling for both queries and doc-
uments associated with the same query. Also, these frameworks define risks in dif-
ferent manners. For example, the document ranking framework defines expected
risks by taking integration overall all documents; the subset ranking framework de-
fines expected risks by taking integration overall all queries, and the two-layer rank-
ing framework defines expected risks by taking integration overall both queries and
documents.
Search WWH ::




Custom Search