Statistical Learning Theory for Ranking - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

Chapter 15

Statistical Learning Theory for Ranking

Abstract In this chapter, we introduce the statistical learning theory for ranking. In

order to better understand existing learning-to-rank algorithms, and to design better

algorithms, it is very helpful to deeply understand their theoretical properties. In

this chapter, we give the big picture of theoretical analysis for ranking, and point

out several important issues to be investigated: statistical ranking framework, gen-

eralization ability, and statistical consistency for ranking methods.

15.1 Overview

As a new machine learning problem, ranking is not only about effective algorithms

but also about the theory behind these algorithms.

Actually, theoretical analysis on an algorithm always plays an important role in

machine learning. This is because in practice one can only observe experimental re-

sults on relatively small-scale datasets (e.g., the experimental results on the LETOR

benchmark datasets as introduced in Chap. 11). To some extent, such empirical re-

sults might not be fully reliable, because a small training set sometimes cannot fully

realize the potential of a learning algorithm, and a small test set sometimes cannot

reflect the true performance of an algorithm. This is due to the fact that the input

and output spaces are too large to be well represented by a small number of sam-

ples. In this regard, theories are sorely needed in order to analyze the performance

of a learning algorithm in the case that the training data are infinite and the test data

are randomly sampled from the input and output spaces.

15.2 Statistical Learning Theory

In Fig. 15.1 , we give a typical taxonomy of statistical learning theory for empirical

risk minimization, which summarizes most of the theoretical aspects that one is

interested in.

Basically, to understand a learning problem theoretically, we need to have a sta-

tistical framework, which describes the assumptions on the data generation (e.g.,

whether the instances are i.i.d.). We also need to define the true loss of the learning

Search WWH ::

Custom Search

Home