Information Technology Reference
with (q i ,d j ) . This matrix is then normalized to be a query-to-document transition
matrix, denoted by A , where A i,j is the probability that q i transits to d j in one hop.
Similarly, the transpose of matrix W can be normalized to be a document-to-query
transition matrix B .Using A and B , one can compute the probability of transiting
from any node to another node in the bipartite graph. Then, the query similarity is
defined as the probability that one query transits to another query in two hops (or
the corresponding elements in the matrix AB ).
Experimental results in [ 15 ] have shown that the smooth technique is effective,
and training with the smoothed click-through data can lead to the gain of ranking
Note that query clustering has been well studied in the literature of Web search.
Therefore, the method proposed in [ 15 ] is not the only method that one can leverage.
Please also see [ 29 , 30 , 34 ] for more work on query clustering.
13.3 Training Data Selection
In the previous section, we have discussed general methodologies to mine ground
truth labels for learning to rank from click-through data. In Chap. 1,wehavealsoin-
troduced the labeling methodology with human annotators. Basically both the min-
ing and labeling methodologies assume that the data (either the click-through data
or the document in the annotation pool) are available in advance, and one deals with
the data (either performing mining or conducting labeling) in a passive way. How-
ever, this might not be the optimal choice because some assumptions in the mining
or labeling process might not be considered when collecting the data. This will make
the ground-truth labels obtained from these data less effective for training a ranking
Here is an example about the labeling of TREC Million Query Track. When
forming the pool for judgment, those documents that can best distinguish different
runs of the participants were selected. As a result, when the human assessors labeled
the documents in the pool, those highly relevant documents that are ranked high by
all the runs will be missing since they are not in the pool at all. Although this will
not hurt evaluation (whose goal is to distinguish different methods), it will hurt
the model training (whose goal is to learn as much information about relevance
as possible from the training data). As a consequence, it is possible that not all
the judged documents are useful for training, and some useful documents are not
To tackle the aforementioned problems, it is better to think about the strategy, in
addition to the methodology, for mining or labeling. That is, can we present those
documents more suitable for training models but not for distinguishing methods to
the human annotators? Can we pre-define the distribution of the labeled data before
labeling (e.g., the number of queries versus the number of documents)? Should we
use all the labeled data, or conduct some selection before training? In this section,
we will introduce some previous work that tries to address the above issues.