The Listwise Approach - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

loss function is convex, and therefore one can safely use a gradient descent method

to optimize the loss.

Here we would like to point out that ListMLE can be regarded as a special case

of ListNet. If we set P y (π) in ListNet to be 1 for π y , and 0 for all the other permuta-

tions, the K-L divergence between P(π

f(w, x )) and P y (π) will turn out to be the

negative log likelihood as defined in ( 4.24 ).

When the judgment is not total order, one needs to use the concept of equivalent

permutation set Ω y to extend the ListMLE algorithm. That is, we can regard all

the permutations in Ω y to be the ground truth permutations. Then we can choose

to maximize the average likelihood of the permutations in Ω y , or to maximize the

largest likelihood in it (inspired by multi-instance learning [ 2 ]). In the latter case,

the loss function becomes

|

−

log P π y |

f(w, x ) .

L(f

;

x ,Ω y )

=

min

π y ∈

(4.25)

Ω y

Since the size of the constraint set is exponentially large, it is costly to find the

permutation with the smallest loss in each round of iteration. In [ 31 ] an effective

way of conducting the search is proposed. That is, one can first sort the objects

according to their ratings in the ground truth. Then for the objects having the same

rating, one sorts them in descending order of their ranking scores. It has been proven

that the resultant permutation has the smallest loss. Also, the minimal loss changes

with respect to w , which suggests that the optimization process will be an iterative

one. It has been proven that this iterative process can converge in polynomial time.

As an extension of ListMLE, in [ 15 ], a Gamma distribution based prior is intro-

duced to regularize the likelihood loss. The corresponding algorithm is adopted in

applications like movie ranking and driver ranking, and have been shown to be very

effective.

4.3.3 Ranking Using Cumulative Distribution Networks

In [ 16 ], a ranking method based on cumulative distribution networks is proposed,

which can be regarded as an extension of ListNet and ListMLE.

As pointed out in [ 16 ], unlike standard regression or classification in which we

predict outputs independently, in ranking we are interested in predicting structured

outputs so that mis-ranking one object can significantly affect whether we correctly

rank the other objects. For this purpose, cumulative distribution network (CDN)

is used, which is an undirected graphical model whose joint cumulative density

function (CDF) F(z) over a set of random variables is represented as a product over

functions defined over subsets of these variables. Mathematically,

F(z)

=

φ c (z c ),

(4.26)

c

where φ c is a potential function, and c is a clique in the graphical model.

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home