Generalization Analysis for Ranking - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

•

If we only have a limited budget to label C documents in total, according to Theo-

rem 17.6 , there is an optimal trade off between the number of training queries and

that of training documents per query. This is consistent with previous empirical

findings in [ 18 ].

Actually one can attain the optimal trade off by solving the following optimiza-

tion problem:

2 M 2 log ( δ )

n

min

n,m ( 1 ) ,...,m (n)

D(L ◦ F ,n) +

D L

, m (i)

2

n

2 M 2 log δ

m (i) n 2

1

n

+

◦ F

i

=

1

i

=

1

n

s . t .

m i =

C.

i =

1

This optimization problem is easy to solve. For example, if VC( F

)

=

V ,

for th e pairwise 0-1 loss, the solution to the optimization problem is n ∗

=

+ √ 2lo g ( 4 /δ)

c 1 √ 2 V

c 1 √ V

√ C , m (i) ∗ ≡

C

n ∗

, where c 1 is a constant. From this result we note

the following.

n ∗ decreases with the increasing capacity of the function class. That is, we should

label fewer queries and more documents per query when the hypothesis space is

larger.

•

For fixed hypothesis space, n ∗ increases with the confidence level δ . That is, we

should label more query if we want the bound to hold with a larger probability.

•

The above findings can be used to explain the behavior of existing pairwise rank-

ing algorithms, and can be used to guide the construction of training set for learning

to rank.

17.3 Algorithm-Dependent Generalization Bound

Note that the uniform generalization bound is valid for any function in the hypoth-

esis space under investigation, no matter whether the function is the optimal one

learned by a given algorithm or not. Therefore, it is understandable that the bound

might not be very tight. In contrast, it is possible that we can get a tighter bound for

the optimal function learned by the algorithm. This is just the motivation of devel-

oping the algorithm-dependent generalization bounds. 2

Again, in this section, we

2 Note that the disadvantage of algorithm-dependent bounds lies in that they can only be used for

specific algorithms, and may not be derived for every algorithm.

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home