Information Technology Reference

In-Depth Information

Chapter 17

Generalization Analysis for Ranking

Abstract
In this chapter, we introduce the generalization analysis on learning-to-

rank methods. In particular, we first introduce the uniform generalization bounds

and then the algorithm-dependent generalization bounds. The uniform bounds hold

for any ranking function in a given function class. The algorithm-dependent bounds

instead consider the specific ranking function learned by the given algorithm, thus

can usually be tighter. The bounds introduced in this chapter are derived under dif-

ferent ranking frameworks, and can explain behaviors of different learning-to-rank

algorithms. We also show the limitations of existing analyses and discuss how to

improve them in future work.

17.1 Overview

Generalization ability is an important theoretical property of a machine learning

algorithm. It basically describes how a model learned from the training set will per-

form on the unseen test data. This performance is usually determined by the number

of instances in the training data, and the complexity of the model. Generalization

analysis has been well studied for classification, and a lot of work has been done on

the topic. Comparatively, generalization analysis for ranking is not that mature, but

still several attempts have been made.

There are in general two kinds of generalization analysis. The first one is called

uniform generalization analysis, which tries to reveal a bound of the generalization

ability for any function in the function class under investigation. In other words, such

an analysis is not only applicable to the optimal model learned by a given algorithm.

The second one is called algorithm-dependent generalization analysis, which is only

applicable to the model learned from the training data using a specific algorithm.

Since more information has been used in the second type of generalization analysis,

the generalization bound is usually tighter than the uniform bound. However, as a

trade off, its application scope will be smaller than that of the uniform bound.

Both types of generalization analyses heavily depend on the statistical ranking

frameworks introduced in the previous chapter. For example, with the document

ranking framework, one is concerned with the generalization over documents (or

document pairs); with the subset ranking framework, one is concerned with the