Generalization Analysis for Ranking - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

will introduce previous work according to the statistical ranking frameworks that

they use.

17.3.1 For Document Ranking

In [ 1 , 3 ], the stability is used as a tool to derive the algorithm-dependent general-

ization bound for pairwise ranking algorithms, under the framework of document

ranking.

Let

A

be a ranking algorithm, and let L be the loss function that is minimized

A

in

. Suppose we have learned a ranking model f 1 from m training documents

using algorithm

. Then we replace a training document with another document

and learn a new model f 2 from the new training data. We say that

A

has uniform

loss stability β with respect to L , if the difference between the losses with respect

to f 1 and f 2 on any unseen document pair (x u ,x v ) is smaller than β(m) . In other

words, if the model learned from the training data is robust to the small change in

the training data, the algorithm is regarded as having certain stability.

The generalization bound obtained in [ 1 , 3 ] is as shown in Theorem 17.7 .

A

Theorem 17.7 Let L be the loss function with L(f ; x u ,x v ,y u,v )

∈[

0 ,B

]

, and

A

has uniform loss stability β . Then with probability at least 1

−

δ( 0 <δ< 1 ) , the

following inequality holds :

2ln ( δ )

m

+ mβ (m)

B

− R(f S )

R(f S )

≤

2 β(m)

+

.

(17.11)

For different algorithms, one can get different stability coefficients (note that not

all algorithms are stable, and therefore the stability coefficients of some algorithms

do not approach zero when the number of training data m approaches infinity).

In particular, symmetric ranking algorithms with general regularization terms are

stable. For example, Ranking SVM is a stable algorithm, and its stability coefficient

is

16 κ 2

λm

β(m)

=

,

(17.12)

κ 2 <

where

∀

x

∈ X

,K(x,x)

≤

∞

, λ is the tradeoff coefficient for the regulariza-

tion term in Ranking SVM.

As can be seen from the above discussions, for algorithms like Ranking SVM

[ 11 , 12 ], the bound converges to zero at a rate of O(

1

√ m ) . This bound is tighter than

the bound based on the Rank-Shatter Coefficient (see inequality ( 17.2 )) except for

the case of using linear scoring functions in the one-dimensional feature space. This

clearly shows the advantage of the algorithm-dependent bound.

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home