Data Preprocessing for Learning to Rank - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

13.3.1 Document and Query Selection for Labeling

No matter how the labels are obtained, the process is non-trivial and one needs

to consider how to make it more cost-effective. There are at least two issues to

be considered for this purpose. First, if we can only label a fixed total number of

documents, how should we distribute them (more queries and fewer documents per

query vs. fewer queries and more documents per query)? Second, if we can only

label a fixed total number of documents, which of the documents in the corpus

should we present to the annotators?

13.3.1.1 Deep Versus Shallow Judgments

In [ 32 ], an empirical study is conducted regarding the influence of label distribution

on learning to rank. In the study, LambdaRank [ 11 ] is used as the learning-to-rank

algorithm, and a dataset from a commercial search engine is used as the experimen-

tal platform. The dataset contains 382 features and is split into training, validation,

and test sets with 2,000, 1,000, and 2,000 queries respectively. The average number

of judged documents in the training set is 350 per query, and the number highly

varies across different queries.

To test the effect of judging more queries versus more documents per query, dif-

ferent training sets are formed by (i) sampling p% queries while keeping the number

of documents per query fixed to the maximum available, and (ii) sampling p% of

documents per query and keeping the number of queries fixed. Then LambdaRank

is trained using different training data and NDCG@10 on the test set is computed.

The experiments are repeated ten times and the average NDCG@10 value is used

for the final study.

According to the experimental results, one has the following observations.

•

Given limited number of judgments, it is better to judge more queries but fewer

documents per query than fewer queries with more documents per query. Some-

times additional documents per query do not result in any additional improve-

ments in the quality of the training set.

•

The lower bound on the number of documents per query is 8 on the dataset used

in the study. When the lower bound is met, if one has to decrease the total number

of judgments further, it is better to decrease the number of queries in the training

data.

The explanation in [ 32 ] on the above experimental findings is based on the infor-

mativeness of the training set. Given some number of judged documents per query,

judging more documents for this query does not really add much information to the

training set. However, including a new query is much more informative since the

new query may have quite different properties than the queries that are already in

the training set. In [ 8 ], a theoretical explanation on this empirical finding is provided

based on the statistical learning theory for ranking. Please refer to Chap. 17 for more

details.

Search WWH ::

Custom Search

Home