Future Work - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

In addition to discussing how to handle large data from a purely computational

perspective, we may want to investigate the problem from another angle. That is, it

may not be always necessary to increase the data scale. With more and more data,

the learning curve may get saturated, and the cost we pay for the extra computations

may become wasteful. To gain more understanding on this, we need to jointly con-

sider the learning theory and computational theory for ranking. This can also be an

important piece of future work.

20.6 Online Complexity Versus Accuracy

Most of the research on learning to rank has focused on optimizing the relevance of

the ranking results. This quest for accuracy can lead to very complex models, and

complex models are often computationally expensive. This brings the question on

whether this kind of models can be deployed in a real world ranking system where

latency is an important factor. In web search for instance there are stringent require-

ments on the execution time: e.g., the document scoring phase should typically not

exceed 50 milliseconds.

For this reason, there is a recent effort on trying to build models which

have a reduced execution time. For example, [ 18 ] explicitly addressed this accu-

racy/complexity trade-off for linear ranking functions. A linear function is of course

very fast to evaluate, but the feature computation cost should also be taken into

account. By performing feature selection, the authors of the aforementioned paper

were able to substantially reduce the online complexity without much loss in accu-

racy.

In the context of decision trees ensembles, execution time can be reduced using

early exits [ 3 ]: if, based on a partial evaluation of the trees, a document does not

appear to be relevant, this document is not further evaluated. The ideas in these two

papers could be combined using a cascade architecture pioneered in [ 17 ]. An en-

semble of trees would be learned such that early trees use only cheap features while

later trees are allowed to use more expensive features. But combined with early ex-

its, these expensive trees would not be evaluated for a large number of documents.

This architecture would effectively considerably reduce the online complexity of

decision trees ensembles.

This line of research bears some resemblance with large-scale learning. In both

cases, the goal is to reduce the time complexity due to large datasets, but large

scaling learning is concerned with reducing the training time, while the focus of

this section is on the testing time. Nevertheless, it might be possible to transfer

techniques and ideas between these two domains.

20.7 Robust Learning to Rank

In most of the existing works, multiple ranking functions are compared and evalu-

ated, and the best ranking function is selected based on some evaluation measures,

Search WWH ::

Custom Search

Home