Introduction - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

search engine (see Fig. 1.2 ). If one wants to know more about widely used features,

please refer to Tables 10.2 and 10.3 in Chap. 10.

Even if a feature is the output of an existing retrieval model, in the context of

learning to rank, one assumes that the parameter in the model is fixed, and only

the optimal way of combining these features is learned. In this sense, the previous

works on automatically tuning the parameters of existing models [ 36 , 75 ] are not

categorized as “learning-to-rank” methods.

The capability of combining a large number of features is an advantage of

learning-to-rank methods. It is easy to incorporate any new progress on a retrieval

model by including the output of the model as one dimension of the features. Such a

capability is highly demanding for real search engines, since it is almost impossible

to use only a few factors to satisfy the complex information needs of Web users.

Discriminative Training “ Discriminative training ” means that the learning pro-

cess can be well described by the four components of discriminative learning as

mentioned in the previous subsection. That is, a learning-to-rank method has its

own input space, output space, hypothesis space, and loss function.

In the literature of machine learning, discriminative methods have been widely

used to combine different kinds of features, without the necessity of defining a prob-

abilistic framework to represent the generation of objects and the correctness of pre-

diction. In this sense, previous works that train generative ranking models are not

categorized as “learning-to-rank” methods in this topic. If one has interest in such

works, please refer to [ 45 , 52 , 93 ], etc.

Discriminative training is an automatic learning process based on the training

data. This is also highly demanding for real search engines, because everyday these

search engines will receive a lot of user feedback and usage logs. It is very impor-

tant to automatically learn from the feedback and constantly improve the ranking

mechanism.

Due to the aforementioned two characteristics, learning to rank has been widely

used in commercial search engines, 18 and has also attracted great attention from the

academic research community.

1.3.3 Learning-to-Rank Framework

Figure 1.6 shows the typical “learning-to-rank” flow. From the figure we can see that

since learning to rank is a kind of supervised learning, a training set is needed. The

creation of a training set is very similar to the creation of the test set for evaluation.

For example, a typical training set consists of n training queries q i (i

=

1 ,...,n) ,

x (i)

j

m (i)

j

their associated documents represented by feature vectors x (i)

={

}

(where

=

1

18 See

http://blog.searchenginewatch.com/050622-082709 ,

http://blogs.msdn.com/msnsearch/

archive/2005/06/21/431288.aspx ,

d http://glinden.blogspot.com/2005/06/msn-search-and-

learning-to-rank.html .

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home