The Listwise Approach - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

Again, for ease of discussion, we sometimes also write NDCG as NDCG (π, y )

or NDCG (f, x , y ) (when π

=

◦

sort

f( x ) ).

4.2 Minimization of Measure-Specific Loss

It might be the most straightforward choice to learn the ranking model by directly

optimizing what is used to evaluate the ranking performance. This is exactly the mo-

tivation of the first sub-category of the listwise approach, i.e., the direct optimization

methods. However, the task is not as trivial as it seems. As mentioned before, evalu-

ation measures such as NDCG and MAP are position based, and thus discontinuous

and non-differentiable [ 26 , 33 ]. The difficulty of optimizing such objective func-

tions stems from the fact that most existing optimization techniques were developed

to handle continuous and differentiable cases.

To tackle the challenges, several attempts have been made. First, one can choose

to optimize a continuous and differentiable approximation of the measure-based

ranking error. By doing so, many existing optimization technologies can be lever-

aged. Example algorithms include SoftRank [ 27 ], Approximate Rank [ 25 ], and

SmoothRank [ 7 ]. Second, one can alternatively optimize a continuous and differ-

entiable (and sometimes even convex) bound of the measure-based ranking error.

Example algorithms include SVM map [ 36 ], SVM ndcg [ 5 ], and PermuRank [ 33 ]. 1

Third, one can choose to use optimization technologies that are able to optimize non-

smooth objectives. For example, one can leverage the Boosting framework for this

purpose (the corresponding algorithm is called AdaRank [ 32 ]), or adopt the genetic

algorithm for the optimization (the corresponding algorithm is called RankGP [ 34 ]).

4.2.1 Measure Approximation

4.2.1.1 SoftRank

In SoftRank [ 27 ], it is assumed that the ranking of the documents is not simply

determined by sorting according to the scoring function. Instead, it introduces ran-

domness to the ranking process by regarding the real score of a document as a

random variable whose mean is the score given by the scoring function. In this way,

it is possible that a document can be ranked at any position, of course with dif-

ferent probabilities. For each such possible ranking, one can compute an NDCG

value. Then the expectation of NDCG over all possible rankings can be used as an

smooth approximation of the original evaluation measure NDCG. The detailed steps

to achieve this goal are elaborated as follows.

1 Actually, this trick has also been used in classification. That is, since the 0-1 classification loss is

non-differentiable, convex upper bounds like the exponential loss have been used instead.

Search WWH ::

Custom Search

Home