Information Technology Reference

In-Depth Information

3.3.7 LambdaRank

In [
7
], another way of introducing position-based weights to the pairwise loss func-

tion is proposed. In particular, the evaluation measures (which are position based)

are directly used to define the gradient with respect to each document pair in the

training process. The hope is that the corresponding implicit loss function can be

more consistent with the way that we evaluate the final ranking results.

Here we use an example to illustrate the basic idea. Suppose we have only two

relevant documents
x
1
and
x
2
, and their current ranks are as shown in Fig.
3.11
.

Suppose we are using NDCG@1 as the evaluation measure. Then, it is clear that if

we can move either
x
1
or
x
2
to the top position of the ranked list, we will achieve

the maximum NDCG@1.

It is clearly more convenient to move
x
1
up since the effort will be much smaller

than that for
x
2
. So, we can define (but not compute) the “gradient” with regards to

the ranking score of
x
1
(denoted as
s
1
=

f(x
1
)
) as larger than that with regards to

the ranking score of
x
2
(denoted as
s
2
=

f(x
2
)
). In other words, we can consider that

there is an underlying implicit loss function
L
in the optimization process, which

suggests

∂L

∂s
1

>
∂L

∂s
2

.

(3.30)

The above “gradient” is called the lambda function, and this is why the algorithm

is named LambdaRank. When we use NDCG to guide the training process, a specific

form of the lambda function is given in [
7
]:

2
y
u

2
y
v

f(x
v
))
η
r(x
u
)
−

η(x
v
)
,

−

λ

=

Z
m

(3.31)

+

−

1

exp
(f (x
u
)

where
r(

·

)
represents the rank position of a document in the previous iteration of

training.

For each pair of documents
x
u
and
x
v
, in each round of optimization, their scores

are updated by

λ
respectively. In [
15
], the lambda functions for MAP

and MRR are derived, and the local optimality of the LambdaRank algorithm is

discussed. According to the experimental results, the LambdaRank algorithm can

lead to local optimum of the objective function, and can have a competitive ranking

performance.

+

λ
and

−

Fig. 3.11
Optimizing

NDCG@1 by LambdaRank