Generalization Analysis for Ranking - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

•

When the number of training queries is finite, the expected risk and the empirical

risk are not necessarily close to each other.

Furthermore, as shown in [ 14 ], IR-SVM [ 7 ] has a query-level stability with co-

efficient τ(n)

4 κ 2

λn . On this basis, one can find that when the number of training

queries approaches infinity, with a large probability the empirical risk of IR-SVM

will converge to its expected risk, at a convergence rate of O( 1

=

√ n ) . When the num-

ber of queries is finite, the query-level generalization bound is a decreasing function

of the number of training queries.

By comparing the query-level generalization abilities of Ranking SVM and IR-

SVM, we can find that the convergence rates for these two algorithms are both

O( 1

√ n ) . However, by comparing the case with a finite number of training queries,

the bound for IR-SVM is much tighter than that for Ranking SVM.

17.3.3 For Two-Layer Ranking

As far as we know, there is still no work on the algorithm-dependent generalization

bounds under the two-layer ranking framework.

17.4 Summary

In this chapter, we have introduced the generalization analysis on ranking algo-

rithms, with different types of bounds, and under different statistical ranking frame-

works.

As can be seen, early works assume the use of the document ranking frame-

work. As a result, the corresponding theoretical results can describe the accuracy of

ranking unseen document pairs. However, as mentioned in [ 14 ], such a result is not

highly desired in the context of information retrieval. This is mainly because people

care more about the ranking accuracy for new queries but not for new document

pairs. Some attempts have been made along this direction [ 13 , 14 ], however, these

theoretical results can only explain the change of generalization ability with respect

to the increasing number of training queries, and ignore the influence of the num-

ber of training documents per query on the generalization ability. In order to obtain

such kinds of results, one needs to investigate the generalization theory under the

two-layer ranking framework. Work along this line is, however, still very limited [ 8 ].

For example, currently the two-layer generalization ability of listwise ranking meth-

ods is not clear, and the study on two-layer stability (or other algorithm-dependent

bounds) is still missing. More work is expected to be conducted along this line.

For clarity, we summarize what have been introduced in this chapter using Ta-

bles 17.2 and 17.3 . From the tables, we can see that there are still large space to

explore along this direction. Generalization analysis for ranking is still an emerging

research area, and we can expect its fast development in the future.

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home