The Listwise Approach - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

4.3.1 ListNet

In ListNet [ 4 ], the loss function is defined using the probability distribution on per-

mutations.

Actually the probability distributions on permutations have been well studied in

the field of probability theory. Many famous models have been proposed to repre-

sent permutation probability distributions, such as the Plackett-Luce model [ 20 , 24 ]

and the Mallows model [ 21 ]. Since a permutation has a natural one-to-one corre-

spondence with a ranked list, these researches can be naturally applied to ranking.

ListNet [ 4 ] is just such an example, demonstrating how to apply the Plackett-Luce

model to learning to rank.

Given the ranking scores of the documents outputted by the scoring function f

(i.e., s

m

j =

f(x j ) ), the Plackett-Luce model defines a probabil-

ity for each possible permutation π of the documents, based on the chain rule, as

follows:

={

s j }

1 , where s j =

m

ϕ(s π − 1 (j) )

u = 1 ϕ(s π − 1 (u) ) ,

P(π

|

s )

=

(4.22)

j

=

1

where π − 1 (j) denotes the document ranked at the j th position of permutation π

and ϕ is a transformation function, which can be linear, exponential, or sigmoid.

Please note that the Plackett-Luce model is scale invariant and translation invari-

ant in certain conditions. For example, when we use the exponential function as the

transformation function, after adding the same constant to all the ranking scores,

the permutation probability distribution defined by the Plackett-Luce model will

not change. When we use the linear function as the transformation function, after

multiplying all the ranking scores by the same constant, the permutation probabil-

ity distribution will not change. These properties are quite in accordance with our

intuitions on ranking.

With the Plackett-Luce model, for a given query q , ListNet first defines the per-

mutation probability distribution based on the scores given by the scoring func-

tion f . Then it defines another permutation probability distribution P y (π) based on

the ground truth label. For example, if the ground truth is given as relevance degrees,

it can be directly substituted into the Plackett-Luce model to obtain a probability

distribution. When the ground truth is given as a permutation π y (or a permutation

set Ω y ), one can define P y (π) as a delta function which only takes a value of 1

for this permutation (or the permutations in the set), and takes a value of 0 for all

the other permutations. One can also first employ a mapping function to map the

positions in the ground truth permutation to real-valued scores and then use ( 4.22 )

to compute the probability distribution. Furthermore, one can also choose to use the

Mallows model [ 21 ] to define P y (π) by taking the ground truth permutation as the

centroid in the model.

For the next step, ListNet uses the K-L divergence between the probability distri-

bution for the ranking model and that for the ground truth to define its loss function

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home