Information Technology Reference
In-Depth Information
4.2.1.4 SmoothRank
In [ 7 ], a similar idea to Approximate Rank is proposed. That is, the evaluation mea-
sures are smoothed by approximating the rank position. The slight difference lies in
the way of defining the approximator and the way of performing the optimization.
Again, here we use NDCG as an example for illustration. Please note that the
same idea also applies to MAP and many other evaluation measures.
In [ 7 ], NDCG is rewritten by using both the indexes of the documents and the
positions in the ranking result:
m
m
G(y π 1 (u) )η(u)I { x j = x π 1 (u) } ,
(4.12)
j
=
1
u
=
1
where I
indicates whether document x j is ranked at the u th position.
Then the following soft version of the indicator function I
{
x j
=
x π 1 (u) }
is intro-
{
x j =
x π 1 (u) }
duced:
e (f (x j ) f(x π 1 (u) )) 2
l = 1 e (f (x l ) f(x π 1 (u) )) 2 .
h ju =
(4.13)
With h ju , one can get the smoothened version of NDCG, and define the loss
function on its basis:
m
m
L(f
;
x , y )
=
1
G(y π 1 (u) )η(u)h ju
j
=
1
u
=
1
e (f (x j ) f(x π 1 (u) )) 2
l = 1 e (f (x l ) f(x π 1 (u) )) 2 .
m
m
=
1
G(y π 1 (u) )η(u)
(4.14)
j
=
1
u
=
1
It is clear that h ju is a continuous function of the scoring function f , and so is the
above loss function. Therefore, one can use the gradient descent method to optimize
it.
Similar to the discussions about Approximate Rank, the choice of the smoothing
parameter σ is critical: on the one hand, if it is small the objective function will be
very close to the original evaluation measure and is therefore highly non-smooth
and difficult to optimize; on the other hand, if it is large, the objective function
is smooth and easy to optimize, but substantially different from the corresponding
evaluation measure. In [ 7 ], a deterministic annealing strategy is used to assist the
optimization. This procedure can be seen as an homotopy method where a series of
functions are constructed: the first one is easy to optimize, the last one is the function
of interest and each function in the middle can be seen as a deformed function of the
previous one. The homotopy method iteratively minimizes each of these functions
starting from the minimizer of the previous function. In particular, the deformation
is controlled by σ : one starts with a large σ , and the minimizer iteratively reduces
σ by steps.
Search WWH ::




Custom Search