Semi-supervised Ranking - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

•

The proposed transductive approach does not give improvements across all

queries. Gains come from the greater proportion of improved queries than that

of degraded queries.

•

The transductive approach requires online computations. The total time per query

is around hundreds of seconds for the datasets under investigation. Therefore, it is

desired that better code optimization or novel distributed algorithms can be used

to make the approach practically applicable.

8.3 Discussions

As we can see, the above works have borrowed some concepts and algorithms from

semi-supervised classification. Although good ranking performances have been ob-

served, the validity of doing so may need further justification. For example, since

similarity is essential to many classification algorithms (i.e., “similar documents

should have the same class label”), it looks very natural and reasonable to propa-

gate labels cross similar documents. However, in ranking, similarity does not play

the same central role. It seems that preference is more fundamental than similarity.

Then the question is whether it is still natural and reasonable to conduct similarity-

based label propagation for semi-supervised ranking.

Furthermore, in classification, if we do not have class labels, we know nothing

about the conditional probability p(y | x) . However, in ranking, even if we do not

have ground-truth labels, we still have several very strong rankers, such as BM25

[ 6 ] and LMIR [ 5 ], which can give us a relatively reasonable guess on which doc-

ument should be ranked higher. In other words, we have some knowledge about

the unlabeled data. If we can incorporate such knowledge into the semi-supervised

ranking process, we may have the chance to do a better job.

References

1. Amini, M.R., Truong, T.V., Goutte, C.: A boosting algorithm for learning bipartite ranking

functions with partially labeled data. In: Proceedings of the 31st Annual International ACM

SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008),

pp. 99-106 (2008)

2. Duh, K., Kirchhoff, K.: Learning to rank with partially-labeled data. In: Proceedings of the

31st Annual International ACM SIGIR Conference on Research and Development in Infor-

mation Retrieval (SIGIR 2008), pp. 251-258 (2008)

3. Freund, Y., Iyer, R., Schapire, R., Singer, Y.: An efficient boosting algorithm for combining

preferences. Journal of Machine Learning Research 4 , 933-969 (2003)

4. Niu, Z.Y., Ji, D.H., Tan, C.L.: Word sense disambiguation using label propagation based semi-

supervised learning. In: Proceedings of the 403rd Annual Meeting of the Association for Com-

putational Linguistics (ACL 2005), pp. 395-402 (2005)

5. Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceed-

ings of the 21st Annual International ACM SIGIR Conference on Research and Development

in Information Retrieval (SIGIR 1998), pp. 275-281 (1998)

6. Robertson, S.E.: Overview of the okapi projects. Journal of Documentation 53 (1), 3-7 (1997)

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home