Data Preprocessing for Learning to Rank - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

with (q i ,d j ) . This matrix is then normalized to be a query-to-document transition

matrix, denoted by A , where A i,j is the probability that q i transits to d j in one hop.

Similarly, the transpose of matrix W can be normalized to be a document-to-query

transition matrix B .Using A and B , one can compute the probability of transiting

from any node to another node in the bipartite graph. Then, the query similarity is

defined as the probability that one query transits to another query in two hops (or

the corresponding elements in the matrix AB ).

Experimental results in [ 15 ] have shown that the smooth technique is effective,

and training with the smoothed click-through data can lead to the gain of ranking

performance.

Note that query clustering has been well studied in the literature of Web search.

Therefore, the method proposed in [ 15 ] is not the only method that one can leverage.

Please also see [ 29 , 30 , 34 ] for more work on query clustering.

13.3 Training Data Selection

In the previous section, we have discussed general methodologies to mine ground

truth labels for learning to rank from click-through data. In Chap. 1,wehavealsoin-

troduced the labeling methodology with human annotators. Basically both the min-

ing and labeling methodologies assume that the data (either the click-through data

or the document in the annotation pool) are available in advance, and one deals with

the data (either performing mining or conducting labeling) in a passive way. How-

ever, this might not be the optimal choice because some assumptions in the mining

or labeling process might not be considered when collecting the data. This will make

the ground-truth labels obtained from these data less effective for training a ranking

model.

Here is an example about the labeling of TREC Million Query Track. When

forming the pool for judgment, those documents that can best distinguish different

runs of the participants were selected. As a result, when the human assessors labeled

the documents in the pool, those highly relevant documents that are ranked high by

all the runs will be missing since they are not in the pool at all. Although this will

not hurt evaluation (whose goal is to distinguish different methods), it will hurt

the model training (whose goal is to learn as much information about relevance

as possible from the training data). As a consequence, it is possible that not all

the judged documents are useful for training, and some useful documents are not

judged.

To tackle the aforementioned problems, it is better to think about the strategy, in

addition to the methodology, for mining or labeling. That is, can we present those

documents more suitable for training models but not for distinguishing methods to

the human annotators? Can we pre-define the distribution of the labeled data before

labeling (e.g., the number of queries versus the number of documents)? Should we

use all the labeled data, or conduct some selection before training? In this section,

we will introduce some previous work that tries to address the above issues.

Search WWH ::

Custom Search

Home