Information Technology Reference
In-Depth Information
Fig. 7.2
Illustration of KNN Offline-2
7.2.3 Query Clustering-Based Approach
In [ 1 ], a similar idea to that of [ 5 ] is used to perform query-dependent ranking. The
difference lies in that in [ 1 ] query clustering is used instead of the KNN techniques.
Specifically, training queries are clustered offline. For this purpose, query q is rep-
resented by the point cloud of its associated documents in the feature space, denoted
as D q . Then for two queries q and q , one first conducts PCA to compute the se-
quences of the principle components of D q and D q , denoted as (u 1 ,...,u P ) and
(u 1 ,...,u P ) respectively. Then the similarity between two queries is computed as
the sum of the inner product between two corresponding principle components:
P
u p u p .
1
P
sim (q, q ) =
(7.7)
p
=
1
With this query similarity, all the training queries are clustered using complete-
link agglomerative clustering [ 10 ]into C clusters
. For each cluster
Q k , a model is trained using queries belonging to it. During test time, given a test
query q , one locates the nearest training cluster to the test query as follows:
{
Q 1 ,...,Q C }
sim (q, q ).
arg max
k
max
q
(7.8)
Q k
After that one uses the model trained from that cluster to rank the documents asso-
ciated with the test query.
Search WWH ::




Custom Search