Information Technology Reference
In this chapter, we have reviewed the relationships between different loss functions
in learning to rank and the evaluation measures. The discussions well explain why
different learning-to-rank algorithms perform reasonably well in practice (see the
experimental results in Chap. 11).
While the analyses introduced in this chapter look quite nice, there are still sev-
eral issues that have not been well solved. First, although the essential loss can be
used to explain the relationship between measure-based ranking errors and several
loss functions in the pairwise and listwise approaches, it is not clear whether it can
be extended to explain the pointwise approach. Second, from a machine learning
point of view, being an upper bound of the evaluation measure might not be suffi-
cient for being a good loss function. The reason is that what we really care about
is the optimal solution with regards to the loss function. Even if a loss can be the
upper bound of a measure-based ranking error everywhere, its optimum may still be
far away from the optimum of the measure-based ranking error.
The discussions on the “tendency correlation” are one step toward solving the
second problem as aforementioned. However, the condition of the tendency correla-
tion still seems too strong and sometimes not necessary. A more principled solution
should be obtained by investigating the so-called “consistency” of the loss functions,
which exactly describes whether the optima with regards to the loss function and the
measure can be the same. Consistency of learning methods has been well studied in
classification, but not yet for ranking. More discussions on this topic can be found
in Part VI. Please note that the discussions on consistency are valid only when the
number of training examples approaches infinity. However, in practice, the number
of training examples is always limited. In this case, the theoretical analysis might
not be always in accordance with experimental results [ 9 ]. Therefore, a lot of work
needs to be further done in order to predict the performance of a learning-to-rank
method when the sample size is small.
5.1 In this chapter, the relationship between (1
MAP) and the pairwise loss func-
tions as well as the listwise loss function have been discussed. However, there is
nothing on whether the relationship between (1
MAP) and the pointwise loss
function also exists. Please make discussions on this by extending the results
in [ 4 ] and [ 7 ].
5.2 Investigate whether one can get similar results as presented in this chapter for
other evaluation measures, such as MRR and RC.
5.3 Show that although the zero value of the essential loss is a necessary and suf-
ficient condition for the zero value of (1
NDCG) and (1
MAP), when the
value is very small but non-zero, (1
NDCG) and (1
MAP) will not neces-
sarily be very small.