Information Technology Reference

In-Depth Information

5.5 Summary

In this chapter, we have reviewed the relationships between different loss functions

in learning to rank and the evaluation measures. The discussions well explain why

different learning-to-rank algorithms perform reasonably well in practice (see the

experimental results in Chap. 11).

While the analyses introduced in this chapter look quite nice, there are still sev-

eral issues that have not been well solved. First, although the essential loss can be

used to explain the relationship between measure-based ranking errors and several

loss functions in the pairwise and listwise approaches, it is not clear whether it can

be extended to explain the pointwise approach. Second, from a machine learning

point of view, being an upper bound of the evaluation measure might not be suffi-

cient for being a good loss function. The reason is that what we really care about

is the optimal solution with regards to the loss function. Even if a loss can be the

upper bound of a measure-based ranking error everywhere, its optimum may still be

far away from the optimum of the measure-based ranking error.

The discussions on the “tendency correlation” are one step toward solving the

second problem as aforementioned. However, the condition of the tendency correla-

tion still seems too strong and sometimes not necessary. A more principled solution

should be obtained by investigating the so-called “consistency” of the loss functions,

which exactly describes whether the optima with regards to the loss function and the

measure can be the same. Consistency of learning methods has been well studied in

classification, but not yet for ranking. More discussions on this topic can be found

in Part VI. Please note that the discussions on consistency are valid only when the

number of training examples approaches infinity. However, in practice, the number

of training examples is always limited. In this case, the theoretical analysis might

not be always in accordance with experimental results [
9
]. Therefore, a lot of work

needs to be further done in order to predict the performance of a learning-to-rank

method when the sample size is small.

5.6 Exercises

5.1 In this chapter, the relationship between (1

MAP) and the pairwise loss func-

tions as well as the listwise loss function have been discussed. However, there is

nothing on whether the relationship between (1

−

MAP) and the pointwise loss

function also exists. Please make discussions on this by extending the results

in [
4
] and [
7
].

5.2 Investigate whether one can get similar results as presented in this chapter for

other evaluation measures, such as MRR and RC.

5.3 Show that although the zero value of the essential loss is a necessary and suf-

ficient condition for the zero value of (1

−

−

NDCG) and (1

−

MAP), when the

value is very small but non-zero, (1

−

NDCG) and (1

−

MAP) will not neces-

sarily be very small.