The Listwise Approach - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

rewritten as 3

G(y j )

log ( 1

Z − 1

π(x j )) ,

(4.8)

where π(x j ) is the position of document x j in the ranking result π , which can be

calculated as

π(x j )

I { f(x j ) − f(x u )< 0 } .

(4.9)

u = j

From the above equation, one can clearly see where the non-smooth nature of

NDCG comes from. Actually, NDCG is a smooth function of the rank position,

however, the rank position is a non-smooth function of the ranking scores because

of the indicator function.

The key idea in [ 25 ] is to approximate the indicator function by a sigmoid func-

tion, 4 such that the position can be approximated by a smooth function of the rank-

ing scores:

exp (

−

α(f (x j )

−

f(x u )))

π(x j )

f(x u ))) ,

(4.10)

exp (

−

α(f (x j )

−

where α> 0 is a scaling constant.

By substituting π(x j ) in ( 4.8 )by

π(x j ) , one can get the approximation

for NDCG (denoted as AppNDCG), and then define the loss function as (1

−

AppNDCG),

G(y j )

log ( 1

Z − 1

L(f

;

x , y )

−

π(x j )) .

(4.11)

j =

Since this loss function is continuous and differentiable with respect to the scor-

ing function, one can simply use the gradient descent method to minimize it.

The approximation accuracy is analyzed in [ 25 ]. The basic conclusion is that

when α is set to be sufficiently large, the approximation can become very accurate.

In other words, if one can find the optimum of the proposed loss function, it is very

likely one also obtains the optimum of NDCG.

Note that the more accurate the approximation is, the more “non-smooth” the loss

function becomes. Correspondingly, the optimization process becomes less robust.

Usually one needs to adopt some robust optimization algorithm such as simulated

annealing or random optimization to ensure the finding of a good solution to the

optimization problem.

3 Note that here we directly assume η(r) =

log ( 1

r) for simplicity although other forms of discount

function are also valid in defining NDCG.

4 Note that sigmoid function is a large family of functions. For example the widely used logistic

function is a special case of the sigmoid function, and other family members include the ordi-

nary arc-tangent function, the hyperbolic tangent function and the error function. Here the logistic

function is taken as an example of derivation.

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home