Information Technology Reference
In addition to discussing how to handle large data from a purely computational
perspective, we may want to investigate the problem from another angle. That is, it
may not be always necessary to increase the data scale. With more and more data,
the learning curve may get saturated, and the cost we pay for the extra computations
may become wasteful. To gain more understanding on this, we need to jointly con-
sider the learning theory and computational theory for ranking. This can also be an
important piece of future work.
20.6 Online Complexity Versus Accuracy
Most of the research on learning to rank has focused on optimizing the relevance of
the ranking results. This quest for accuracy can lead to very complex models, and
complex models are often computationally expensive. This brings the question on
whether this kind of models can be deployed in a real world ranking system where
latency is an important factor. In web search for instance there are stringent require-
ments on the execution time: e.g., the document scoring phase should typically not
exceed 50 milliseconds.
For this reason, there is a recent effort on trying to build models which
have a reduced execution time. For example, [ 18 ] explicitly addressed this accu-
racy/complexity trade-off for linear ranking functions. A linear function is of course
very fast to evaluate, but the feature computation cost should also be taken into
account. By performing feature selection, the authors of the aforementioned paper
were able to substantially reduce the online complexity without much loss in accu-
In the context of decision trees ensembles, execution time can be reduced using
early exits [ 3 ]: if, based on a partial evaluation of the trees, a document does not
appear to be relevant, this document is not further evaluated. The ideas in these two
papers could be combined using a cascade architecture pioneered in [ 17 ]. An en-
semble of trees would be learned such that early trees use only cheap features while
later trees are allowed to use more expensive features. But combined with early ex-
its, these expensive trees would not be evaluated for a large number of documents.
This architecture would effectively considerably reduce the online complexity of
decision trees ensembles.
This line of research bears some resemblance with large-scale learning. In both
cases, the goal is to reduce the time complexity due to large datasets, but large
scaling learning is concerned with reducing the training time, while the focus of
this section is on the testing time. Nevertheless, it might be possible to transfer
techniques and ideas between these two domains.
20.7 Robust Learning to Rank
In most of the existing works, multiple ranking functions are compared and evalu-
ated, and the best ranking function is selected based on some evaluation measures,