The Listwise Approach - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

4.2.1.4 SmoothRank

In [ 7 ], a similar idea to Approximate Rank is proposed. That is, the evaluation mea-

sures are smoothed by approximating the rank position. The slight difference lies in

the way of defining the approximator and the way of performing the optimization.

Again, here we use NDCG as an example for illustration. Please note that the

same idea also applies to MAP and many other evaluation measures.

In [ 7 ], NDCG is rewritten by using both the indexes of the documents and the

positions in the ranking result:

G(y π − 1 (u) )η(u)I { x j = x π − 1 (u) } ,

(4.12)

where I

indicates whether document x j is ranked at the u th position.

Then the following soft version of the indicator function I

{

x j

x π − 1 (u) }

is intro-

{

x j =

x π − 1 (u) }

duced:

e − (f (x j ) − f(x π − 1 (u) )) 2 /σ

l = 1 e − (f (x l ) − f(x π − 1 (u) )) 2 /σ .

h ju =

(4.13)

With h ju , one can get the smoothened version of NDCG, and define the loss

function on its basis:

L(f

;

x , y )

−

G(y π − 1 (u) )η(u)h ju

e − (f (x j ) − f(x π − 1 (u) )) 2 /σ

l = 1 e − (f (x l ) − f(x π − 1 (u) )) 2 /σ .

−

G(y π − 1 (u) )η(u)

(4.14)

It is clear that h ju is a continuous function of the scoring function f , and so is the

above loss function. Therefore, one can use the gradient descent method to optimize

it.

Similar to the discussions about Approximate Rank, the choice of the smoothing

parameter σ is critical: on the one hand, if it is small the objective function will be

very close to the original evaluation measure and is therefore highly non-smooth

and difficult to optimize; on the other hand, if it is large, the objective function

is smooth and easy to optimize, but substantially different from the corresponding

evaluation measure. In [ 7 ], a deterministic annealing strategy is used to assist the

optimization. This procedure can be seen as an homotopy method where a series of

functions are constructed: the first one is easy to optimize, the last one is the function

of interest and each function in the middle can be seen as a deformed function of the

previous one. The homotopy method iteratively minimizes each of these functions

starting from the minimizer of the previous function. In particular, the deformation

is controlled by σ : one starts with a large σ , and the minimizer iteratively reduces

σ by steps.

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home