Future Work - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

Chapter 20

Future Work

Abstract In this chapter, we discuss the possible future work on learning to rank.

In particular, we show some potential research topics along the following directions:

sample selection bias, direct learning from logs, feature engineering, advanced rank-

ing models, large-scale learning to rank, online complexity, robust learning to rank,

and online learning to rank. At the end of this chapter, we will make brief discus-

sions on the new scenarios beyond ranking, which seems to be the future trend of

search. Algorithmic and theoretical discussions on the new scenario may lead to

another promising research direction.

As mentioned several times in the topic, there are still many open problems regard-

ing learning to rank. We have made corresponding discussions at the end of several

chapters. In addition, there are some other future work items [ 1 ], as listed in this

chapter. Note that the below list is by no means complete. The field of learning

to rank is still growing very fast, and there are a lot more topics awaiting further

investigation.

20.1 Sample Selection Bias

Training sets for learning to rank are typically constructed using the so-called pool-

ing strategy. These documents are thus, by construction, more relevant than the vast

majority of other documents. However, in a search engine, the test process is dif-

ferent. A web search engine typically uses a scheme with two phases (or more) to

retrieve the relevant documents. The first phase is a filtering one in which the po-

tentially relevant documents—according to a basic ranking function—are selected

from the entire search engine index. Then these documents are scored in a second

phase by the learned ranking function. But there is still a large number of docu-

ments in this second phase: tens of thousands. And most of these documents have

little relevance to the query. There is thus a striking difference in the document dis-

tribution between training and test. This problem is called the sample selection bias

[ 19 ]: the documents in the training set have not been drawn at random from the test

distribution; they are biased toward relevant documents.

Search WWH ::

Custom Search

Home