Information Technology Reference
loss function is convex, and therefore one can safely use a gradient descent method
to optimize the loss.
Here we would like to point out that ListMLE can be regarded as a special case
of ListNet. If we set P y (π) in ListNet to be 1 for π y , and 0 for all the other permuta-
tions, the K-L divergence between P(π
f(w, x )) and P y (π) will turn out to be the
negative log likelihood as defined in ( 4.24 ).
When the judgment is not total order, one needs to use the concept of equivalent
permutation set Ω y to extend the ListMLE algorithm. That is, we can regard all
the permutations in Ω y to be the ground truth permutations. Then we can choose
to maximize the average likelihood of the permutations in Ω y , or to maximize the
largest likelihood in it (inspired by multi-instance learning [ 2 ]). In the latter case,
the loss function becomes
log P π y |
f(w, x ) .
x ,Ω y )
π y ∈
Since the size of the constraint set is exponentially large, it is costly to find the
permutation with the smallest loss in each round of iteration. In [ 31 ] an effective
way of conducting the search is proposed. That is, one can first sort the objects
according to their ratings in the ground truth. Then for the objects having the same
rating, one sorts them in descending order of their ranking scores. It has been proven
that the resultant permutation has the smallest loss. Also, the minimal loss changes
with respect to w , which suggests that the optimization process will be an iterative
one. It has been proven that this iterative process can converge in polynomial time.
As an extension of ListMLE, in [ 15 ], a Gamma distribution based prior is intro-
duced to regularize the likelihood loss. The corresponding algorithm is adopted in
applications like movie ranking and driver ranking, and have been shown to be very
4.3.3 Ranking Using Cumulative Distribution Networks
In [ 16 ], a ranking method based on cumulative distribution networks is proposed,
which can be regarded as an extension of ListNet and ListMLE.
As pointed out in [ 16 ], unlike standard regression or classification in which we
predict outputs independently, in ranking we are interested in predicting structured
outputs so that mis-ranking one object can significantly affect whether we correctly
rank the other objects. For this purpose, cumulative distribution network (CDN)
is used, which is an undirected graphical model whose joint cumulative density
function (CDF) F(z) over a set of random variables is represented as a product over
functions defined over subsets of these variables. Mathematically,
φ c (z c ),
where φ c is a potential function, and c is a clique in the graphical model.