Introduction - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

(1,3,2)is 3

1

and the τ K between (1, 2, 3) and (3, 1, 2) is

3 . Therefore, we can

2

obtain that τ K (π, Ω l ) =

3 in this case.

To summarize, there are some common properties in these evaluation mea-

sures. 12

1. All these evaluation measures are calculated at the query level . That is, first the

measure is computed for each query, and then averaged over all queries in the test

set. No matter how poorly the documents associated with a particular query are

ranked, it will not dominate the evaluation process since each query contributes

similarly to the average measure.

2. All these measures are position based . That is, rank position is explicitly used.

Considering that with small changes in the scores given by a ranking model, the

rank positions will not change until one document's score passes another, the

position-based measures are usually discontinuous and non-differentiable with

regards to the scores. This makes the optimization of these measures quite diffi-

cult. We will conduct more discussions on this in Sect. 4.2.

Note that although when designing ranking models, many researchers have taken

the assumption that the ranking models can assign a score to each query-document

pair independently of other documents; when performing evaluation, all the docu-

ments associated with a query are considered together. Otherwise, one cannot deter-

mine the rank position of a document and the aforementioned measures cannot be

defined.

1.3 Learning to Rank

Many ranking models have been introduced in the previous section, most of which

contain parameters. For example, there are parameters k 1 and b in BM25 (see ( 1.2 )),

parameter λ in LMIR (see ( 1.3 )), and parameter α in PageRank (see ( 1.5 )). In order

to get a reasonably good ranking performance (in terms of evaluation measures), one

needs to tune these parameters using a validation set. Nevertheless, parameter tuning

is far from trivial, especially considering that evaluation measures are discontinuous

and non-differentiable with respect to the parameters. In addition, a model perfectly

tuned on the validation set sometimes performs poorly on unseen test queries. This

is usually called over-fitting. Another issue is regarding the combination of these

ranking models. Given that many models have been proposed in the literature, it is

natural to investigate how to combine these models and create an even more effective

new model. This is, however, not straightforward either.

12 Note that this is not a complete introduction of evaluation measures for information retrieval.

There are several other measures proposed in the literature, some of which even consider the nov-

elty and diversity in the search results in addition to the relevance. One may want to refer to [ 2 , 17 ,

56 , 91 ] for more information.

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home