The Listwise Approach - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

=

where g q is any sign-preserving function, such as g q (x)

α q x , and α q is a query-

dependent positive constant.

It is clear that E

[

|

]

represents the lack of compatibility between the relative

document orderings given by π and those given by s , with more positive energy

indicating less compatibility.

Using the energy function, we can now define the conditional Boltzmann distri-

bution over document permutations by exponentiating and normalizing, as follows:

π

s

Z exp −

] ,Z

exp −

] .

1

P(π

|

s )

=

E

[

π

|

s

=

E

[

π

|

s

(4.29)

π

s ) and Z exactly since both of them contain

sums over exponentially many permutations. In practice, however, efficient approx-

imations of these quantities allow inference and learning in the model.

After defining the permutation probability as above, one can follow the idea in

ListMLE [ 31 ] to define the loss function as the negative log likelihood of the ground

truth permutation, or follow the idea in ListNet [ 4 ] to define the loss function as

the K-L divergence between the permutation probability distribution given by the

scoring function f and that given by the ground truth labels.

In addition, it is also possible to extend the above idea to optimize evaluation

measures [ 30 ]. For example, suppose the evaluation measure is NDCG, then its

expected value with respect to all the possible permutations can be computed as

follows:

Note that it is costly to compute P(π

|

E

[

NDCG

]=

P(π

|

s ) NDCG (π, y ).

(4.30)

π

s ) in the above formula is a function of the ranking model f , and

it is continuous and differentiable, one can use simple gradient based methods to ef-

fectively optimize ( 4.30 ). In order to avoid over fitting, one can also use the linear

combination of the expected NDCG and the K-L divergence loss to guide the opti-

mization process. Experimental results have shown that in this way, the BoltzRank

method can outperform many other ranking methods, such as AdaRank [ 32 ] and

ListNet [ 4 ].

Since only P(π |

4.4 Summary

As shown in this chapter, different kinds of listwise ranking algorithms have been

proposed. Intuitively speaking, they model the ranking problem in a more natural

way than the pointwise and pairwise approaches, and thus can address some prob-

lems that these two approaches have encountered. As we have discussed in the previ-

ous chapters, for the pointwise and pairwise approaches, the positional information

is invisible to their loss functions, and they ignore the fact that some documents

(or document pairs) are associated with the same query. Comparatively speaking,

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home