Information Technology Reference
In-Depth Information
Table 10.3
(Continued)
ID
Feature description
t i q d TF(t i ,d) · log ( | C IDF(t i )) in 'title + abstract'
39
t i q d log ( TF(t i ,d)
| C |
TF(qt i ,C) + 1 ) in 'title + abstract'
40
·
LEN(d)
41
BM25 of 'title
+
abstract'
42
log(BM25) of 'title
+
abstract'
43
LMIR.DIR of 'title
+
abstract'
44
LMIR.JMofin'title
+
abstract'
45
LMIR.ABS of in 'title + abstract'
ranking models. These models are selected using the validation set and finally eval-
uated on the test set.
In order to make the evaluation more comprehensive, five-fold cross validation
is suggested in LETOR. In particular, each dataset in LETOR is partitioned into five
parts with about the same number of queries, denoted as S1, S2, S3, S4, and S5, in
order to conduct five-fold cross validation. For each fold, three parts are used for
training the ranking model, one part for tuning the hyper parameters of the ranking
algorithm (e.g., the number of iterations in RankBoost [ 3 ] and the combination co-
efficient in the objective function of Ranking SVM [ 4 , 7 ]), and the remaining part
for evaluating the ranking performance of the learned model (see Table 10.5 ). The
average performance over the five folds is used to measure the overall performance
of a learning-to-rank algorithm.
One may have noticed that the natural labels in all the LETOR datasets are rel-
evance degrees. As aforementioned, sometimes, pairwise preference and even total
order of the documents are also valid labels. To facilitate learning with such kinds
of labels, in LETOR 4.0, the total order of the labeled documents in MQ2007 and
MQ2008 are derived by heuristics, and used for training.
In addition to the standard supervised ranking, LETOR also supports semi-
supervised ranking and rank aggregation. Different from the task of supervised rank-
ing, semi-supervised ranking considers both judged and unjudged query-document
pairs for training. For rank aggregation, a query is associated with a set of input
ranked lists but not the features for individual documents. The task is to output a
better ranked list by aggregating the multiple input lists.
The LETOR datasets, containing the aforementioned feature representations of
documents, their relevance judgments with respective to queries, and the partitioned
training, validation, and test sets can be downloaded from the official LETOR web-
site, http://research.microsoft.com/~LETOR/ . 5
5 Note that the LETOR datasets are being frequently updated. It is expected that more datasets will
be added in the future. Furthermore, the LETOR website has evolved to be a portal for the research
on learning to rank, which is not limited to the data release only. One can find representative papers,
tutorials, events, research groups, etc. in the area of learning to rank from the website.
Search WWH ::




Custom Search