Information Technology Reference
InDepth Information
4.3.1 ListNet
In ListNet [
4
], the loss function is defined using the probability distribution on per
mutations.
Actually the probability distributions on permutations have been well studied in
the field of probability theory. Many famous models have been proposed to repre
sent permutation probability distributions, such as the PlackettLuce model [
20
,
24
]
and the Mallows model [
21
]. Since a permutation has a natural onetoone corre
spondence with a ranked list, these researches can be naturally applied to ranking.
ListNet [
4
] is just such an example, demonstrating how to apply the PlackettLuce
model to learning to rank.
Given the ranking scores of the documents outputted by the scoring function
f
(i.e.,
s
m
j
=
f(x
j
)
), the PlackettLuce model defines a probabil
ity for each possible permutation
π
of the documents, based on the chain rule, as
follows:
={
s
j
}
1
, where
s
j
=
m
ϕ(s
π
−
1
(j)
)
u
=
1
ϕ(s
π
−
1
(u)
)
,
P(π

s
)
=
(4.22)
j
=
1
where
π
−
1
(j)
denotes the document ranked at the
j
th position of permutation
π
and
ϕ
is a transformation function, which can be linear, exponential, or sigmoid.
Please note that the PlackettLuce model is scale invariant and translation invari
ant in certain conditions. For example, when we use the exponential function as the
transformation function, after adding the same constant to all the ranking scores,
the permutation probability distribution defined by the PlackettLuce model will
not change. When we use the linear function as the transformation function, after
multiplying all the ranking scores by the same constant, the permutation probabil
ity distribution will not change. These properties are quite in accordance with our
intuitions on ranking.
With the PlackettLuce model, for a given query
q
, ListNet first defines the per
mutation probability distribution based on the scores given by the scoring func
tion
f
. Then it defines another permutation probability distribution
P
y
(π)
based on
the ground truth label. For example, if the ground truth is given as relevance degrees,
it can be directly substituted into the PlackettLuce model to obtain a probability
distribution. When the ground truth is given as a permutation
π
y
(or a permutation
set
Ω
y
), one can define
P
y
(π)
as a delta function which only takes a value of 1
for this permutation (or the permutations in the set), and takes a value of 0 for all
the other permutations. One can also first employ a mapping function to map the
positions in the ground truth permutation to realvalued scores and then use (
4.22
)
to compute the probability distribution. Furthermore, one can also choose to use the
Mallows model [
21
] to define
P
y
(π)
by taking the ground truth permutation as the
centroid in the model.
For the next step, ListNet uses the KL divergence between the probability distri
bution for the ranking model and that for the ground truth to define its loss function