Semi-supervised Ranking - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

Chapter 8

Semi-supervised Ranking

Abstract In this chapter, we introduce semi-supervised learning for ranking. The

motivation of this topic comes from the fact that we can always collect a large num-

ber of unlabeled documents or queries at a low cost. It would be very helpful if one

can leverage such unlabeled data in the learning-to-rank process. In this chapter, we

mainly review a transductive approach and an inductive approach to this task, and

discuss how to improve these approaches by taking the unique properties of ranking

into consideration.

So far in the previous chapters of the topic, we have mainly discussed supervised

learning in ranking. However, just like the case in classification, sometimes unla-

beled data will help us reduce the volume of required labeled data. There have been

some preliminary attempts [ 1 , 2 ] on semi-supervised ranking.

8.1 Inductive Approach

In [ 1 ], an inductive approach is taken. More specifically, the ground-truth labels

of the labeled documents are propagated to the unlabeled documents, according to

their mutual similarity in the feature space. The same technology has been widely

used in semi-supervised classification [ 4 , 7 , 8 ].

In order to exploit information from the unlabeled dataset, it is assumed that

an unlabeled document that is similar to a labeled document should have similar

label to that labeled document. One begins with selecting unlabeled documents that

are the most similar to a labeled document x and assign them the corresponding

relevance judgment y . For ease of discussion, we refer to such unlabeled documents

as automatically-labeled documents , while the original labeled documents human-

labeled documents .

After the label propagation, a simple approach is to add these automatically-

labeled documents to the original training set and then learn a ranking function as

in the supervised case, e.g., using RankBoost [ 3 ]. However, this training scheme

suffers from the following drawback. As the automatically-labeled documents have

error-prone labels, the ranking performance would be highly dependent on how ro-

Search WWH ::

Custom Search

Home