Information Technology Reference
In-Depth Information
The proposed transductive approach does not give improvements across all
queries. Gains come from the greater proportion of improved queries than that
of degraded queries.
The transductive approach requires online computations. The total time per query
is around hundreds of seconds for the datasets under investigation. Therefore, it is
desired that better code optimization or novel distributed algorithms can be used
to make the approach practically applicable.
8.3 Discussions
As we can see, the above works have borrowed some concepts and algorithms from
semi-supervised classification. Although good ranking performances have been ob-
served, the validity of doing so may need further justification. For example, since
similarity is essential to many classification algorithms (i.e., “similar documents
should have the same class label”), it looks very natural and reasonable to propa-
gate labels cross similar documents. However, in ranking, similarity does not play
the same central role. It seems that preference is more fundamental than similarity.
Then the question is whether it is still natural and reasonable to conduct similarity-
based label propagation for semi-supervised ranking.
Furthermore, in classification, if we do not have class labels, we know nothing
about the conditional probability p(y | x) . However, in ranking, even if we do not
have ground-truth labels, we still have several very strong rankers, such as BM25
[ 6 ] and LMIR [ 5 ], which can give us a relatively reasonable guess on which doc-
ument should be ranked higher. In other words, we have some knowledge about
the unlabeled data. If we can incorporate such knowledge into the semi-supervised
ranking process, we may have the chance to do a better job.
References
1. Amini, M.R., Truong, T.V., Goutte, C.: A boosting algorithm for learning bipartite ranking
functions with partially labeled data. In: Proceedings of the 31st Annual International ACM
SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008),
pp. 99-106 (2008)
2. Duh, K., Kirchhoff, K.: Learning to rank with partially-labeled data. In: Proceedings of the
31st Annual International ACM SIGIR Conference on Research and Development in Infor-
mation Retrieval (SIGIR 2008), pp. 251-258 (2008)
3. Freund, Y., Iyer, R., Schapire, R., Singer, Y.: An efficient boosting algorithm for combining
preferences. Journal of Machine Learning Research 4 , 933-969 (2003)
4. Niu, Z.Y., Ji, D.H., Tan, C.L.: Word sense disambiguation using label propagation based semi-
supervised learning. In: Proceedings of the 403rd Annual Meeting of the Association for Com-
putational Linguistics (ACL 2005), pp. 395-402 (2005)
5. Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceed-
ings of the 21st Annual International ACM SIGIR Conference on Research and Development
in Information Retrieval (SIGIR 1998), pp. 275-281 (1998)
6. Robertson, S.E.: Overview of the okapi projects. Journal of Documentation 53 (1), 3-7 (1997)
Search WWH ::




Custom Search