Information Technology Reference
In-Depth Information
Fig. 7.2
Illustration of KNN Offline-2
7.2.3 Query Clustering-Based Approach
In [
1
], a similar idea to that of [
5
] is used to perform query-dependent ranking. The
difference lies in that in [
1
] query clustering is used instead of the KNN techniques.
Specifically, training queries are clustered offline. For this purpose, query
q
is rep-
resented by the point cloud of its associated documents in the feature space, denoted
as
D
q
. Then for two queries
q
and
q
, one first conducts PCA to compute the se-
quences of the principle components of
D
q
and
D
q
, denoted as
(u
1
,...,u
P
)
and
(u
1
,...,u
P
)
respectively. Then the similarity between two queries is computed as
the sum of the inner product between two corresponding principle components:
P
u
p
u
p
.
1
P
sim
(q, q
)
=
(7.7)
p
=
1
With this query similarity, all the training queries are clustered using complete-
link agglomerative clustering [
10
]into
C
clusters
. For each cluster
Q
k
, a model is trained using queries belonging to it. During test time, given a test
query
q
, one locates the nearest training cluster to the test query as follows:
{
Q
1
,...,Q
C
}
sim
(q, q
).
arg max
k
max
q
∈
(7.8)
Q
k
After that one uses the model trained from that cluster to rank the documents asso-
ciated with the test query.