Information Technology Reference
The relevance judgments are obtained from a retired labeling set of Microsoft
Bing search engine, which take five values from 0 (irrelevant) to 4 (perfectly
There are in total 136 features. These features are not from Bing but extracted
by Microsoft Research. All the features are widely used in the research com-
munity, including query-document matching features, document features, Web
graph features, and user behavior features. The detailed feature list can be found
at the official website of Microsoft Learning-to-Rank Datasets. 2 The availability
of the information about these features will enable researchers to study the impact
of a feature on the ranking performance.
The measures used in Microsoft Learning-to-Rank Datasets are NDCG, P@ k
[ 1 ], and MAP [ 1 ], just as in the LETOR datasets [ 3 ]. Furthermore, the datasets have
been partitioned into five parts for five-fold cross validation, also with the same
strategy as in LETOR. Currently there are no official baselines on these datasets
either. Researchers need to implement their own baselines.
Given the above trends of releasing larger datasets, we believe that with the contin-
ued efforts from the entire research community as well as the industry, more data
resources for learning to rank will be available and the research on learning to rank
for information retrieval can be significantly advanced.
1. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading
2. Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Transac-
tions on Information Systems 20 (4), 422-446 (2002)
3. Liu, T.Y., Xu, J., Qin, T., Xiong, W.Y., Li, H.: LETOR: Benchmark dataset for research on
learning to rank for information retrieval. In: SIGIR 2007 Workshop on Learning to Rank for
Information Retrieval (LR4IR 2007) (2007)
2 http://research.microsoft.com/~MSLR .