Information Technology Reference
In-Depth Information
Fig. 1.5 Machine learning
framework
1.3.2 Definition of Learning to Rank
In recent years, more and more machine learning technologies have been used to
train the ranking model, and a new research area named “learning to rank” has
gradually emerged. Especially in the past several years, learning to rank has become
one of the most active research areas in information retrieval.
In general, we call all those methods that use machine learning technologies to
solve the problem of ranking “learning-to-rank” methods. 15 Examples include the
work on relevance feedback 16 [ 24 , 66 ] and automatically tuning the parameters of
existing information retrieval models [ 36 , 75 ]. However, most of the state-of-the-art
learning-to-rank algorithms learn the optimal way of combining features exacted
from query-document pairs through discriminative training. Therefore, in this topic
we define learning to rank in a more narrow and specific way to better summarize
these algorithms. That is, we call those ranking methods that have the following two
properties learning-to-rank methods.
Feature Based Feature based ” means that all the documents under investigation
are represented by feature vectors, 17 reflecting the relevance of the documents to the
query. That is, for a given query q , its associated document d can be represented by a
vector x = Φ(d,q) , where Φ is a feature extractor. Typical features used in learning
to rank include the frequencies of the query terms in the document, the outputs of
the BM25 model and the PageRank model, and even the relationship between this
document and other documents. These features can be extracted from the index of a
15 In the literature of machine learning, there is a topic named label ranking. It predicts the ranking
of multiple class labels for an individual document, but not the ranking of documents. In this regard,
it is largely different from the task of ranking for information retrieval.
16 We will make further discussions on the relationship between relevance feedback and learning
to rank in Chap. 2.
17 Note that, in this topic, when we refer to a document, we will not use d any longer. Instead, we
will directly use its feature representation x . Furthermore, since our discussions will focus more on
the learning process, we will always assume the features are pre-specified, and will not purposely
discuss how to extract them.
Search WWH ::




Custom Search