Introduction - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

Fig. 1.5 Machine learning

framework

1.3.2 Definition of Learning to Rank

In recent years, more and more machine learning technologies have been used to

train the ranking model, and a new research area named “learning to rank” has

gradually emerged. Especially in the past several years, learning to rank has become

one of the most active research areas in information retrieval.

In general, we call all those methods that use machine learning technologies to

solve the problem of ranking “learning-to-rank” methods. 15 Examples include the

work on relevance feedback 16 [ 24 , 66 ] and automatically tuning the parameters of

existing information retrieval models [ 36 , 75 ]. However, most of the state-of-the-art

learning-to-rank algorithms learn the optimal way of combining features exacted

from query-document pairs through discriminative training. Therefore, in this topic

we define learning to rank in a more narrow and specific way to better summarize

these algorithms. That is, we call those ranking methods that have the following two

properties learning-to-rank methods.

Feature Based “ Feature based ” means that all the documents under investigation

are represented by feature vectors, 17 reflecting the relevance of the documents to the

query. That is, for a given query q , its associated document d can be represented by a

vector x = Φ(d,q) , where Φ is a feature extractor. Typical features used in learning

to rank include the frequencies of the query terms in the document, the outputs of

the BM25 model and the PageRank model, and even the relationship between this

document and other documents. These features can be extracted from the index of a

15 In the literature of machine learning, there is a topic named label ranking. It predicts the ranking

of multiple class labels for an individual document, but not the ranking of documents. In this regard,

it is largely different from the task of ranking for information retrieval.

16 We will make further discussions on the relationship between relevance feedback and learning

to rank in Chap. 2.

17 Note that, in this topic, when we refer to a document, we will not use d any longer. Instead, we

will directly use its feature representation x . Furthermore, since our discussions will focus more on

the learning process, we will always assume the features are pre-specified, and will not purposely

discuss how to extract them.

Search WWH ::

Custom Search

Home