Generalization Analysis for Ranking - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

While the above work uses the concept of the VC dimension for general-

ization analysis, the notion of the bipartite rank-shatter coefficient, denoted as

r( F ,m + ,m − ) ,isusedin[ 2 ]. This new notion has a similar meaning to the shatter-

ing coefficient in classification [ 16 , 17 ].

Using the bipartite rank-shatter coefficient as a tool, the following theorem has

been proven in [ 2 ]. For the class of linear scoring functions in the one-dimensional

feature space, r(

,m + ,m − ) is a constant, regardless of the values of m +

and

m − . In this case, the bound converges to zero at a rate of O( max

√ m +

{

√ m − }

) ,

and is therefore tighter than the bound based on the VC dimension (as given in

Theorem 17.1 ). For the class of linear scoring functions in the d -dimensional fea-

ture space ( d> 1), r(

,m + ,m − ) is of the order O((m + m − ) d ) , and in this case

the bound has a s im ilar con vergence rate to that based on the VC dimension, i.e.,

O( max

log (m + )

m +

, log (m − )

m −

{

} ) .

Theorem 17.2 Let

be the class of real-valued functions on

, then with prob-

ability at least 1

− δ (0 <δ< 1),

8 (m + +

m − )( log δ +

R 0 (f )

− R 0 (f ) ≤

, 2 m + , 2 m − ))

log r(

∀

∈ F

m + m −

(17.2)

Note that the above two results are only applicable to the case of bipartite ranking.

Some other researchers have extended these results to more general cases, e.g., the

case of k -partite ranking [ 9 , 15 ].

Here we take [ 9 ] as an example. In this work, the labels of documents are no

longer assumed to be positive or negative. Instead, a document is preferred to an-

other one if and only if the label of the former document is larger than that of the

latter document. Specifically, given two documents x u and x v whose labels are y u

and y v , the pairwise 0-1 loss is defined as I { (y u − y v )(f (x u ) − f(x v ))< 0 }

. In other words,

if the ranking result given by function f is in the same order of that given by the

ground-truth label, the loss is zero; otherwise the loss is one. Based on the pairwise

0-1 loss, the expected risks and empirical risks are defined as below.

R 0 (f )

[

} ]

(17.3)

{

(y u

−

y v )(f (x u )

−

f(x v ))< 0

m(m

R 0 (f )

(17.4)

{

(y u −

y v )(f (x u )

−

f(x v ))< 0

}

−

1 )

Using the properties of U-statistics [ 9 ], the following generalization bound is

obtained with respect to the aforementioned empirical and expected risks.

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home