Query-Dependent Ranking - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

•

Mutual information difference: Given a dataset, one can compute the mutual in-

formation (MI) to represent the dependency of terms in a query. Then the differ-

ence between the MI computed from the aforementioned two datasets is used as

a feature.

•

Usage rate: If query terms appear in title and anchor text frequently, the corre-

sponding query does very possibly belong to the homepage finding task.

•

POS information: Some topic relevance task queries include a verb to explain

what the user wants to know. Therefore, by analyzing the part of speech informa-

tion, one can extract features for query classification.

With the above features, one can use their linear combination to perform query

classification.

After query classification, different algorithms and information are used for the

queries belonging to different categories. For the topic relevance task, the content

information (e.g., TF and IDF ) is emphasized; on the other hand, for the home-

page finding task, the link and URL information (e.g., PageRank, URL types (root,

subroot, path, or file)) is emphasized.

Experimental results show that in this way, the ranking performance can be im-

proved as compared to using a single ranking function.

7.2.2 K Nearest Neighbor-Based Approach

In [ 5 ], Geng et al. perform query-dependent ranking using the K Nearest Neighbor

(KNN) method.

In the method, the training queries are positioned into the query feature space

in which each query is represented by a feature vector. In the test process, given a

query, its k nearest training queries are retrieved and used to learn a ranking model.

Then this ranking model is used to rank the documents associated with the test

query. This method is named KNN Online in [ 5 ]. This idea has been tested with

Ranking SVM [ 7 , 8 ], and the experimental results show significant performance

improvements as compared to using one single ranking function and using query

classification-based query-dependent ranking. The explanations on this experimen-

tal finding are as follows: (i) in the method ranking for a test query is conducted

by leveraging the useful information of the similar training queries and avoiding

the negative effects from the dissimilar ones; (ii) 'soft' classification of queries is

carried out and similar queries are selected dynamically.

Despite the high ranking performance, one may have noticed a serious problem

with the aforementioned method. Its time complexity is too high to use in practice.

For each single test query, a KNN search plus a model training would be needed,

both of which are very costly. In order to tackle this efficiency problem, two alter-

native methods have been developed, namely KNN Offline-1 and KNN Offline-2,

which move the training offline and significantly reduce the test complexity.

KNN Offline-1 works in the following way (see Fig. 7.1 ). First, for each training

query q i , its k nearest neighbors, denoted as N k (q i ) , in the query feature space are

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home