Data Preprocessing for Learning to Rank - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

In order to compare these sampling strategies, in the experiments of [ 2 ], for each

query, documents from the complete collection are selected with different percent-

ages from 0.6% to 60%, forming different sized subsets of the complete collection

according to each strategy. Five learning-to-rank algorithms are used for evalua-

tion: RankBoost [ 14 ], Regression [ 9 ], Ranking SVM [ 20 , 21 ], RankNet [ 4 ], and

LambdaRank [ 11 ]. According to the experimental results, one has the following

observations:

•

With some sampling strategies, training datasets whose sizes are as small as 1% to

2% of the complete collection are just as effective for learning-to-rank purposes

as the complete collection. This indicates that it is not necessary to use the entire

dataset for training.

•

Hedge seems to be a less effective sampling strategy. Ranking functions trained

on datasets constructed according to the hedge methodology only reach their opti-

mal performance when trained over data sets that are at least 20% of the complete

collection, while in the worst case, the performances of some ranking functions

are significantly lower than the optimal one even when trained over 40% to 50%

of the complete collection (e.g., the performances of RankBoost, Regression, and

RankNet with a hidden layer).

The other sampling strategies work fairly well, 1 though they may perform a little

worse when some learning-to-rank algorithms are used. For example, the LETOR

strategy does not perform very well when Ranking SVM is used, MTC does not

perform very well when RankBoost and Regression are used, and infAP does not

perform very well when RankBoost and RankNet with a hidden layer are used.

However, overall speaking, their performances are acceptable and not so different

from each other, especially when the sampling ratio is larger than 20%.

•

13.3.2.2 Data Selection by Optimizing PPC

In [ 16 ], Geng et al. argue that in order to improve the training performance through

data selection, one needs to first define a reasonable measure of the data quality.

Accordingly, a measure called pairwise preference consistency (PPC) is proposed,

whose definition is

sim x u −

x v , x q

v ,

1

x q

PPC(S)

=

u −

(13.12)

m q ˜

˜

m q

q,q

u,v

u ,v

where S is the training data collection, and sim (.) is a similarity function, a simple

yet effective example is the inner product.

Then let us see how to effectively optimize the PPC of the selected subset of data.

In particular, variable α u is used to indicate whether document x u is selected or not

1 Note that the authors of [ 2 ] mis-interpreted their experimental results in the original paper. They

claimed that the LETOR strategy was the second worst; however, according to the figure they

showed in the paper, the LETOR strategy performed very well as compared to other strategies.

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home