Experimental Results on LETOR - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

Chapter 11

Experimental Results on LETOR

Abstract In this chapter, we take the official evaluation results published at the

LETOR website as the source to perform discussions on the performances of differ-

ent learning-to-rank methods.

11.1 Experimental Settings

Three widely used measures are adopted for the evaluation in the LETOR datasets:

P@ k [ 1 ], MAP [ 1 ], and NDCG@ k [ 6 ]. For a given ranking model, the evaluation

results in terms of these three measures can be computed by the official evaluation

tool provided in LETOR.

LETOR official baselines include several learning-to-rank algorithms, such as

linear regression, belonging to the pointwise approach; Ranking SVM [ 5 , 7 ], Rank-

Boost [ 4 ], and FRank [ 8 ], belonging to the pairwise approach; ListNet [ 2 ], AdaRank

[ 10 ], and SVM map [ 11 ], belonging to the listwise approach. To make fair compar-

isons, the same setting for all the algorithms are adopted. Firstly, most algorithms

use the linear scoring function, except RankBoost and FRank, which uses binary

weak rankers. Secondly, all the algorithms use MAP on the validation set for model

selection. Some detailed experimental settings are listed here.

•

As for linear regression, the validation set is used to select a good mapping from

the ground-truth labels to real values.

•

For Ranking SVM, the public tool of SVMlight is employed and the validation

set is used to tune the parameter λ in its loss function.

•

For RankBoost, the weak ranker is defined on the basis of a single feature with

255 possible thresholds. The validation set is used to determine the best number

of iterations.

•

For FRank, the validation set is used to determine the number of weak learners in

the generalized additive model.

Note that there have been several other empirical studies [ 9 , 12 ] in the literature, based on

LETOR and other datasets. The conclusions drawn from these studies are similar to what we will

introduce in this chapter.

Search WWH ::

Custom Search

Home