Information Technology Reference
InDepth Information
loss function is convex, and therefore one can safely use a gradient descent method
to optimize the loss.
Here we would like to point out that ListMLE can be regarded as a special case
of ListNet. If we set
P
y
(π)
in ListNet to be 1 for
π
y
, and 0 for all the other permuta
tions, the KL divergence between
P(π
f(w,
x
))
and
P
y
(π)
will turn out to be the
negative log likelihood as defined in (
4.24
).
When the judgment is not total order, one needs to use the concept of equivalent
permutation set
Ω
y
to extend the ListMLE algorithm. That is, we can regard all
the permutations in
Ω
y
to be the ground truth permutations. Then we can choose
to maximize the average likelihood of the permutations in
Ω
y
, or to maximize the
largest likelihood in it (inspired by multiinstance learning [
2
]). In the latter case,
the loss function becomes

−
log
P
π
y

f(w,
x
)
.
L(f
;
x
,Ω
y
)
=
min
π
y
∈
(4.25)
Ω
y
Since the size of the constraint set is exponentially large, it is costly to find the
permutation with the smallest loss in each round of iteration. In [
31
] an effective
way of conducting the search is proposed. That is, one can first sort the objects
according to their ratings in the ground truth. Then for the objects having the same
rating, one sorts them in descending order of their ranking scores. It has been proven
that the resultant permutation has the smallest loss. Also, the minimal loss changes
with respect to
w
, which suggests that the optimization process will be an iterative
one. It has been proven that this iterative process can converge in polynomial time.
As an extension of ListMLE, in [
15
], a Gamma distribution based prior is intro
duced to regularize the likelihood loss. The corresponding algorithm is adopted in
applications like movie ranking and driver ranking, and have been shown to be very
effective.
4.3.3 Ranking Using Cumulative Distribution Networks
In [
16
], a ranking method based on cumulative distribution networks is proposed,
which can be regarded as an extension of ListNet and ListMLE.
As pointed out in [
16
], unlike standard regression or classification in which we
predict outputs independently, in ranking we are interested in predicting structured
outputs so that misranking one object can significantly affect whether we correctly
rank the other objects. For this purpose, cumulative distribution network (CDN)
is used, which is an undirected graphical model whose joint cumulative density
function (CDF)
F(z)
over a set of random variables is represented as a product over
functions defined over subsets of these variables. Mathematically,
F(z)
=
φ
c
(z
c
),
(4.26)
c
where
φ
c
is a potential function, and
c
is a clique in the graphical model.