Information Technology Reference
In-Depth Information
The Rocchio algorithm learns the model parameter from the feedback on a given
query, and then uses the model to rank the documents associated with the same
query. It does not consider the generalization of the model across queries. How-
ever, in learning to rank, we learn the ranking model from a training set, and
mainly use it to rank the documents associated with unseen test queries.
The model parameter w in the Rocchio algorithm actually has its physical mean-
ing, i.e., it is the updated query vector. However, in learning to rank, the model
parameter does not have such a meaning and only corresponds to the importance
of each feature to the ranking task.
The goal of the Rocchio algorithm is to update the query formulation for a better
retrieval but not to learn an optimal ranking function. In other words, after the
query is updated, the fixed ranking function (e.g., the VSM model) is used to
return a new set of related documents.
2.5.2 Problems with the Pointwise Approach
Since the input object in the pointwise approach is a single document, the relative
order between documents cannot be naturally considered in their learning processes.
However ranking is more about predicting relative order than accurate relevance
degree.
Furthermore, the two intrinsic properties of the evaluation measures for ranking
(i.e., query level and position based) cannot be well considered by the pointwise
approach:
1. The fact is ignored in these algorithms that some documents are associated with
the same query and some others are not. As a result, when the number of asso-
ciated documents varies largely for different queries, 4 the overall loss function
will be dominated by those queries with a large number of documents.
2. The position of each document in the ranked list is invisible to the pointwise loss
functions. Therefore, the pointwise loss function may unconsciously emphasize
too much those unimportant documents (which are ranked low in the final ranked
list and thus do not affect user experiences).
2.5.3 Improved Algorithms
In order to avoid the problems with the pointwise approach as mentioned above,
RankCosine [ 17 ] introduces a query-level normalization factor to the pointwise loss
4 For the re-ranking scenario, the number of documents to rank for each query may be very similar,
e.g., the top 1000 documents per query. However, if we consider all the documents containing the
query word, the difference between the number of documents for popular queries and that for tail
queries may be very large.
Search WWH ::




Custom Search