The Pointwise Approach - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

•

The Rocchio algorithm learns the model parameter from the feedback on a given

query, and then uses the model to rank the documents associated with the same

query. It does not consider the generalization of the model across queries. How-

ever, in learning to rank, we learn the ranking model from a training set, and

mainly use it to rank the documents associated with unseen test queries.

•

The model parameter w in the Rocchio algorithm actually has its physical mean-

ing, i.e., it is the updated query vector. However, in learning to rank, the model

parameter does not have such a meaning and only corresponds to the importance

of each feature to the ranking task.

•

The goal of the Rocchio algorithm is to update the query formulation for a better

retrieval but not to learn an optimal ranking function. In other words, after the

query is updated, the fixed ranking function (e.g., the VSM model) is used to

return a new set of related documents.

2.5.2 Problems with the Pointwise Approach

Since the input object in the pointwise approach is a single document, the relative

order between documents cannot be naturally considered in their learning processes.

However ranking is more about predicting relative order than accurate relevance

degree.

Furthermore, the two intrinsic properties of the evaluation measures for ranking

(i.e., query level and position based) cannot be well considered by the pointwise

approach:

1. The fact is ignored in these algorithms that some documents are associated with

the same query and some others are not. As a result, when the number of asso-

ciated documents varies largely for different queries, 4 the overall loss function

will be dominated by those queries with a large number of documents.

2. The position of each document in the ranked list is invisible to the pointwise loss

functions. Therefore, the pointwise loss function may unconsciously emphasize

too much those unimportant documents (which are ranked low in the final ranked

list and thus do not affect user experiences).

2.5.3 Improved Algorithms

In order to avoid the problems with the pointwise approach as mentioned above,

RankCosine [ 17 ] introduces a query-level normalization factor to the pointwise loss

4 For the re-ranking scenario, the number of documents to rank for each query may be very similar,

e.g., the top 1000 documents per query. However, if we consider all the documents containing the

query word, the difference between the number of documents for popular queries and that for tail

queries may be very large.

Search WWH ::

Custom Search

Home