The Listwise Approach - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

defined. For example, the utility can be any evaluation measure. Suppose the utility

is denoted as U(

·

) , then the following expected utility is used as the objective for

learning:

E p( s | x ,w) U( s , y ) ,

U(w

;

=

x , y )

(4.6)

where w is the parameter of the ranking function, and s is the ranking scores of the

documents.

With the above definition, although the utility itself may be discontinuous and

non-differentiable, it does not contain any model parameter and will not bring trou-

ble to the learning process. What contains the model parameter is the conditional

probability p( s

|

x ,w) .

In

[ 37 ],

the

independence

between

the

documents

(i.e.,

p( s

|

x ,w)

=

j p(s j |

x j ,w) ) is assumed, and a generalized linear model [ 22 ] is used to defined

p(s j |

x j ,w) , the conditional probability for each single document. A generalized

linear model consists of a likelihood model p(y

θ) , a linear combination of the in-

puts and model parameters w T x , and a link function that maps the parameter θ to

the real line. In particular, a binomial function (i.e., Bin (

|

) ) is used as the likelihood

model in [ 37 ], and a cumulative normal function (i.e., Ψ( · ) ) is used to define the

probit link function:

·

Bin s j ;

Ψ w T x j ;

,m ,

0 , 1

π

p(s j |

=

x ,w)

(4.7)

where m is the number of documents.

Then the algorithm maximizes the expected utility (equivalently, the negative

utility can be regarded as the loss function). In particular, the approximate Bayesian

inference procedure [ 18 ] with a factorized Gaussian prior is used to learn the model

parameters.

Note that in addition to optimizing evaluation measures, the decision theoretical

framework can also be used to optimize other utilities, e.g., the probability of user

clicks on the documents.

4.2.1.3 Approximate Rank

In [ 25 ], Qin et al. point out that the underlying reason why evaluation measures

are non-smooth is that rank positions are non-smooth with respect to ranking

scores. Therefore, they propose performing approximation to the rank positions us-

ing smooth functions of the ranking scores, such that the approximate evaluation

measures can consequently become differentiable and easier to optimize.

Here we take NDCG as an example to illustrate the approximation process. The

same idea also applies to MAP. For more details, please refer to [ 25 ].

If one changes the index of summation in the definition of NDCG (see Chap. 1)

from the positions in the ranking result to the indexes of documents, NDCG can be

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home