Information Technology Reference
In-Depth Information
In order to compare these sampling strategies, in the experiments of [ 2 ], for each
query, documents from the complete collection are selected with different percent-
ages from 0.6% to 60%, forming different sized subsets of the complete collection
according to each strategy. Five learning-to-rank algorithms are used for evalua-
tion: RankBoost [ 14 ], Regression [ 9 ], Ranking SVM [ 20 , 21 ], RankNet [ 4 ], and
LambdaRank [ 11 ]. According to the experimental results, one has the following
observations:
With some sampling strategies, training datasets whose sizes are as small as 1% to
2% of the complete collection are just as effective for learning-to-rank purposes
as the complete collection. This indicates that it is not necessary to use the entire
dataset for training.
Hedge seems to be a less effective sampling strategy. Ranking functions trained
on datasets constructed according to the hedge methodology only reach their opti-
mal performance when trained over data sets that are at least 20% of the complete
collection, while in the worst case, the performances of some ranking functions
are significantly lower than the optimal one even when trained over 40% to 50%
of the complete collection (e.g., the performances of RankBoost, Regression, and
RankNet with a hidden layer).
The other sampling strategies work fairly well, 1 though they may perform a little
worse when some learning-to-rank algorithms are used. For example, the LETOR
strategy does not perform very well when Ranking SVM is used, MTC does not
perform very well when RankBoost and Regression are used, and infAP does not
perform very well when RankBoost and RankNet with a hidden layer are used.
However, overall speaking, their performances are acceptable and not so different
from each other, especially when the sampling ratio is larger than 20%.
13.3.2.2 Data Selection by Optimizing PPC
In [ 16 ], Geng et al. argue that in order to improve the training performance through
data selection, one needs to first define a reasonable measure of the data quality.
Accordingly, a measure called pairwise preference consistency (PPC) is proposed,
whose definition is
sim x u
x v , x q
v ,
1
x q
PPC(S)
=
u
(13.12)
m q ˜
˜
m q
q,q
u,v
u ,v
where S is the training data collection, and sim (.) is a similarity function, a simple
yet effective example is the inner product.
Then let us see how to effectively optimize the PPC of the selected subset of data.
In particular, variable α u is used to indicate whether document x u is selected or not
1 Note that the authors of [ 2 ] mis-interpreted their experimental results in the original paper. They
claimed that the LETOR strategy was the second worst; however, according to the figure they
showed in the paper, the LETOR strategy performed very well as compared to other strategies.
Search WWH ::




Custom Search