Query-Dependent Ranking - Learning to Rank for Information Retrieval - page 118

Information Technology Reference

In-Depth Information

Fig. 7.2

Illustration of KNN Offline-2

7.2.3 Query Clustering-Based Approach

In [ 1 ], a similar idea to that of [ 5 ] is used to perform query-dependent ranking. The

difference lies in that in [ 1 ] query clustering is used instead of the KNN techniques.

Specifically, training queries are clustered offline. For this purpose, query q is rep-

resented by the point cloud of its associated documents in the feature space, denoted

as D q . Then for two queries q and q , one first conducts PCA to compute the se-

quences of the principle components of D q and D q , denoted as (u 1 ,...,u P ) and

(u 1 ,...,u P ) respectively. Then the similarity between two queries is computed as

the sum of the inner product between two corresponding principle components:

P

u p u p .

1

P

sim (q, q ) =

(7.7)

p

=

1

With this query similarity, all the training queries are clustered using complete-

link agglomerative clustering [ 10 ]into C clusters

. For each cluster

Q k , a model is trained using queries belonging to it. During test time, given a test

query q , one locates the nearest training cluster to the test query as follows:

{

Q 1 ,...,Q C }

sim (q, q ).

arg max

k

max

q ∈

(7.8)

Q k

After that one uses the model trained from that cluster to rank the documents asso-

ciated with the test query.

Next Page

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home