Information Technology Reference
In-Depth Information
Chapter 20
Future Work
Abstract In this chapter, we discuss the possible future work on learning to rank.
In particular, we show some potential research topics along the following directions:
sample selection bias, direct learning from logs, feature engineering, advanced rank-
ing models, large-scale learning to rank, online complexity, robust learning to rank,
and online learning to rank. At the end of this chapter, we will make brief discus-
sions on the new scenarios beyond ranking, which seems to be the future trend of
search. Algorithmic and theoretical discussions on the new scenario may lead to
another promising research direction.
As mentioned several times in the topic, there are still many open problems regard-
ing learning to rank. We have made corresponding discussions at the end of several
chapters. In addition, there are some other future work items [ 1 ], as listed in this
chapter. Note that the below list is by no means complete. The field of learning
to rank is still growing very fast, and there are a lot more topics awaiting further
investigation.
20.1 Sample Selection Bias
Training sets for learning to rank are typically constructed using the so-called pool-
ing strategy. These documents are thus, by construction, more relevant than the vast
majority of other documents. However, in a search engine, the test process is dif-
ferent. A web search engine typically uses a scheme with two phases (or more) to
retrieve the relevant documents. The first phase is a filtering one in which the po-
tentially relevant documents—according to a basic ranking function—are selected
from the entire search engine index. Then these documents are scored in a second
phase by the learned ranking function. But there is still a large number of docu-
ments in this second phase: tens of thousands. And most of these documents have
little relevance to the query. There is thus a striking difference in the document dis-
tribution between training and test. This problem is called the sample selection bias
[ 19 ]: the documents in the training set have not been drawn at random from the test
distribution; they are biased toward relevant documents.
Search WWH ::




Custom Search