Information Technology Reference
In-Depth Information
Chapter 8
Semi-supervised Ranking
Abstract In this chapter, we introduce semi-supervised learning for ranking. The
motivation of this topic comes from the fact that we can always collect a large num-
ber of unlabeled documents or queries at a low cost. It would be very helpful if one
can leverage such unlabeled data in the learning-to-rank process. In this chapter, we
mainly review a transductive approach and an inductive approach to this task, and
discuss how to improve these approaches by taking the unique properties of ranking
into consideration.
So far in the previous chapters of the topic, we have mainly discussed supervised
learning in ranking. However, just like the case in classification, sometimes unla-
beled data will help us reduce the volume of required labeled data. There have been
some preliminary attempts [ 1 , 2 ] on semi-supervised ranking.
8.1 Inductive Approach
In [ 1 ], an inductive approach is taken. More specifically, the ground-truth labels
of the labeled documents are propagated to the unlabeled documents, according to
their mutual similarity in the feature space. The same technology has been widely
used in semi-supervised classification [ 4 , 7 , 8 ].
In order to exploit information from the unlabeled dataset, it is assumed that
an unlabeled document that is similar to a labeled document should have similar
label to that labeled document. One begins with selecting unlabeled documents that
are the most similar to a labeled document x and assign them the corresponding
relevance judgment y . For ease of discussion, we refer to such unlabeled documents
as automatically-labeled documents , while the original labeled documents human-
labeled documents .
After the label propagation, a simple approach is to add these automatically-
labeled documents to the original training set and then learn a ranking function as
in the supervised case, e.g., using RankBoost [ 3 ]. However, this training scheme
suffers from the following drawback. As the automatically-labeled documents have
error-prone labels, the ranking performance would be highly dependent on how ro-
Search WWH ::




Custom Search