Information Technology Reference
In-Depth Information
4.3.1 ListNet
In ListNet [ 4 ], the loss function is defined using the probability distribution on per-
mutations.
Actually the probability distributions on permutations have been well studied in
the field of probability theory. Many famous models have been proposed to repre-
sent permutation probability distributions, such as the Plackett-Luce model [ 20 , 24 ]
and the Mallows model [ 21 ]. Since a permutation has a natural one-to-one corre-
spondence with a ranked list, these researches can be naturally applied to ranking.
ListNet [ 4 ] is just such an example, demonstrating how to apply the Plackett-Luce
model to learning to rank.
Given the ranking scores of the documents outputted by the scoring function f
(i.e., s
m
j =
f(x j ) ), the Plackett-Luce model defines a probabil-
ity for each possible permutation π of the documents, based on the chain rule, as
follows:
={
s j }
1 , where s j =
m
ϕ(s π 1 (j) )
u = 1 ϕ(s π 1 (u) ) ,
P(π
|
s )
=
(4.22)
j
=
1
where π 1 (j) denotes the document ranked at the j th position of permutation π
and ϕ is a transformation function, which can be linear, exponential, or sigmoid.
Please note that the Plackett-Luce model is scale invariant and translation invari-
ant in certain conditions. For example, when we use the exponential function as the
transformation function, after adding the same constant to all the ranking scores,
the permutation probability distribution defined by the Plackett-Luce model will
not change. When we use the linear function as the transformation function, after
multiplying all the ranking scores by the same constant, the permutation probabil-
ity distribution will not change. These properties are quite in accordance with our
intuitions on ranking.
With the Plackett-Luce model, for a given query q , ListNet first defines the per-
mutation probability distribution based on the scores given by the scoring func-
tion f . Then it defines another permutation probability distribution P y (π) based on
the ground truth label. For example, if the ground truth is given as relevance degrees,
it can be directly substituted into the Plackett-Luce model to obtain a probability
distribution. When the ground truth is given as a permutation π y (or a permutation
set Ω y ), one can define P y (π) as a delta function which only takes a value of 1
for this permutation (or the permutations in the set), and takes a value of 0 for all
the other permutations. One can also first employ a mapping function to map the
positions in the ground truth permutation to real-valued scores and then use ( 4.22 )
to compute the probability distribution. Furthermore, one can also choose to use the
Mallows model [ 21 ] to define P y (π) by taking the ground truth permutation as the
centroid in the model.
For the next step, ListNet uses the K-L divergence between the probability distri-
bution for the ranking model and that for the ground truth to define its loss function
Search WWH ::




Custom Search