Information Technology Reference
In-Depth Information
Table 11.1
Results on the TD2003 dataset
Algorithm
NDCG@1
NDCG@3
NDCG@10
P@1
P@3
P@10
MAP
Regression
0.320
0.307
0.326
0.320
0.260
0.178
0.241
RankSVM
0.320
0.344
0.346
0.320
0.293
0.188
0.263
RankBoost
0.280
0.325
0.312
0.280
0.280
0.170
0.227
FRank
0.300
0.267
0.269
0.300
0.233
0.152
0.203
ListNet
0.400
0.337
0.348
0.400
0.293
0.200
0.275
AdaRank
0.260
0.307
0.306
0.260
0.260
0.158
0.228
SVM map
0.320
0.320
0.328
0.320
0.253
0.170
0.245
For ListNet, the validation set is used to determine the best mapping from the
ground-truth label to scores in order to use the Plackett-Luce model, and to de-
termine the optimal number of iterations in the gradient descent process.
For AdaRank, MAP is set as the evaluation measure to be optimized, and the
validation set is used to determine the number of iterations.
For SVM map , the publicly available tool SVM map ( http://projects.yisongyue.com/
svmmap/ ) is employed, and the validation set is used to determine the parameter
λ in its loss function.
11.2 Experimental Results on LETOR 3.0
The ranking performances of the aforementioned algorithms on the LETOR 3.0
datasets are listed in Tables 11.1 , 11.2 , 11.3 , 11.4 , 11.5 , 11.6 , and 11.7 . According to
these experimental results, we find that the listwise ranking algorithms perform very
well on most datasets. Among the three listwise ranking algorithms, ListNet seems
to be better than the other two. AdaRank and SVM map obtain similar performances.
Pairwise ranking algorithms obtain good ranking accuracy on some (although not
all) datasets. For example, RankBoost offers the best performance on TD2004 and
NP2003; Ranking SVM shows very promising results on NP2003 and NP2004; and
FRank achieves very good results on TD2004 and NP2004. Comparatively speak-
ing, simple linear regression performs worse than the pairwise and listwise ranking
algorithms. Its results are not so good on most datasets.
We have also observed that most ranking algorithms perform differently on dif-
ferent datasets. They may perform very well on some datasets, but not so well on
the others. To evaluate the overall ranking performances of an algorithm, we use the
number of other algorithms that it can beat over all the seven datasets as a measure.
That is,
7
7
S i (M)
=
I
{
M i (j)>M k (j)
}
j =
1
k =
1
where j is the index of a dataset, i and k are the indexes of algorithms, M i (j ) is the
performance of the i th algorithm on the j th dataset, and I
is the indicator function.
{·}
Search WWH ::




Custom Search