Information Technology Reference
In-Depth Information
A query set with 106 queries on the OHSUMED corpus has been used in many
previous works [ 11 , 19 ], with each query describing a medical search need (associ-
ated with patient information and topic information). The relevance degrees of the
documents with respect to the queries are judged by human assessors, on three lev-
els: definitely relevant, partially relevant, and irrelevant. There are a total of 16,140
query-document pairs with relevance judgments.
10.2.3 The “Gov2” Corpus and Two Query Sets
The Million Query (MQ) track ran for the first time in TREC 2007 and then became
a regular track in the following years. There are two design purposes of the MQ
track. First, it is an exploration of ad-hoc retrieval on a large collection of docu-
ments. Second, it investigates questions of system evaluation, particularly whether
it is better to evaluate using many shallow judgments or fewer thorough judgments.
The MQ track uses the so-called “terabyte” or “Gov2” corpus as its document
collection. This corpus is a collection of Web data crawled from websites in the
.gov domain in early 2004. This collection includes about 25,000,000 documents in
426 gigabytes.
There are about 1700 queries with labeled documents in the MQ track of 2007
(denoted as MQ2007 for short) and about 800 queries in the MQ track of 2008
(denoted as MQ2008). The judgments are given in three levels, i.e., highly relevant,
relevant, and irrelevant.
10.3 Document Sampling
Due to a similar reason to selecting documents for labeling, it is not feasible to
extract feature vectors of all the documents in a corpus either. A reasonable strategy
is to sample some “possibly” relevant documents, and then extract feature vectors
for the corresponding query-document pairs.
For TD2003, TD2004, NP2003, NP2004, HP2003, and HP2004, following the
suggestions in [ 9 ] and [ 12 ], the documents are sampled in the following way. First,
the BM25 model is used to rank all the documents with respect to each query, and
then the top 1000 documents for each query are selected for feature extraction.
Please note that this sampling strategy is to ease the experimental investigation,
and this is by no means to say that learning to rank can only be applicable in such a
re-ranking scenario.
Different from the above tasks in which unjudged documents are regarded as ir-
relevant, in OHSUMED, MQ2007, and MQ2008, the judgments explicitly contain
the category of “irrelevant” and the unjudged documents are ignored in the eval-
uation. Correspondingly, in LETOR, only judged documents are used for feature
extraction and all the unjudged documents are ignored for these corpora.
Search WWH ::




Custom Search