Information Technology Reference
In-Depth Information
rewritten as 3
G(y j )
log ( 1
Z 1
m
π(x j )) ,
(4.8)
+
j
where π(x j ) is the position of document x j in the ranking result π , which can be
calculated as
π(x j )
=
1
+
I { f(x j ) f(x u )< 0 } .
(4.9)
u = j
From the above equation, one can clearly see where the non-smooth nature of
NDCG comes from. Actually, NDCG is a smooth function of the rank position,
however, the rank position is a non-smooth function of the ranking scores because
of the indicator function.
The key idea in [ 25 ] is to approximate the indicator function by a sigmoid func-
tion, 4 such that the position can be approximated by a smooth function of the rank-
ing scores:
exp (
α(f (x j )
f(x u )))
π(x j )
=
1
+
f(x u ))) ,
(4.10)
1
+
exp (
α(f (x j )
u
=
j
where α> 0 is a scaling constant.
By substituting π(x j ) in ( 4.8 )by
π(x j ) , one can get the approximation
for NDCG (denoted as AppNDCG), and then define the loss function as (1
AppNDCG),
m
G(y j )
log ( 1
Z 1
m
L(f
;
x , y )
=
1
π(x j )) .
(4.11)
+
j =
1
Since this loss function is continuous and differentiable with respect to the scor-
ing function, one can simply use the gradient descent method to minimize it.
The approximation accuracy is analyzed in [ 25 ]. The basic conclusion is that
when α is set to be sufficiently large, the approximation can become very accurate.
In other words, if one can find the optimum of the proposed loss function, it is very
likely one also obtains the optimum of NDCG.
Note that the more accurate the approximation is, the more “non-smooth” the loss
function becomes. Correspondingly, the optimization process becomes less robust.
Usually one needs to adopt some robust optimization algorithm such as simulated
annealing or random optimization to ensure the finding of a good solution to the
optimization problem.
3 Note that here we directly assume η(r) =
1
log ( 1
r) for simplicity although other forms of discount
+
function are also valid in defining NDCG.
4 Note that sigmoid function is a large family of functions. For example the widely used logistic
function is a special case of the sigmoid function, and other family members include the ordi-
nary arc-tangent function, the hyperbolic tangent function and the error function. Here the logistic
function is taken as an example of derivation.
Search WWH ::




Custom Search