Database Reference
In-Depth Information
Algorithm 3. The Greedy-Refine algorithm of queries clustering
Input: A pruned query history H with m queries: H = { Q 1 ,…, Q m }, a set of all
Stars: U = { Q i , QC i | Q i H , QC i H }, k
Output: A set of k query clusters H k = { Q 1 , QC 1 ... Q k , QC k }
1. Let B = {} be a buffer that can hold m Q i , QC i
2. While H and k > 0 Do
3. B
4. For each Q i H Do
5. Pick s i = Q i , QC i with minimum r s i from U i = { Q i , QC i | QC i H , | QC i |
= [2, | H | - k + 1]}
6. B B +{ s i }
7. End For
8. Pick s = Q i , QC i with minimum r s from B
9. H H - QC i - { Q i }, H k H k + s , k k - 1
10. End While
11. Return H k
Algorithmic Solution
For clustering queries, we propose a novel approach, which can discover the near-globally optimal so-
lution and has the low time complexity of the algorithm as well. The approach is described as follows.
Observing the solution of the queries clustering, we can find that every representative query connects
with some other queries of H and these connections are like star structures. Here, we call a connection
as a Star . Then we can re-define the queries clustering problem as follows: Let U be the set of all Stars,
i.e., U = { Q i , QC i | Q i H , QC i H }. The cost of each Star s = Q i , QC i U can be denoted as:
c
= ( , ) . Let r s = c s /| QC i | be the performance-price ratio. Our objective is to find a set of
Star S , such that S U , which minimizes the cost and enables that there are k representative queries in
S and any original query Q j H appears at least once at Star s S .
For solving this problem, we propose an approach which consists of two parts: a pre-processing part
and a processing part. In the processing part, we build a sequential permutation k i = { Q i 1 , Q i 2 ,..., Q im } over
H for each query Q i H , where { Q i 1 , Q i 2 ,..., Q im } H and the queries in k i are arranged non-decreasing
according to their cost corresponding to Q i , that is, d ( Q i , Q i 1 ) ≤ d ( Q i , Q i 2 ) ≤... ≤ d ( Q i , Q im ). Such permuta-
tions can help us only consider the first l queries in k i other than all queries in k i when we build the Star
for Q i . Note that, the number l should be choosed appropriately. It can be seen that the complexity of
pre-processing part is O(| H | 2 log| H |), where | H | denotes the number of queries of H .
The task of processing part is to cluster queries by using the Greedy-Refine algorithm (Algorithm 3)
based on the Stars formed in pre-processing part. The input is a set of all Stars formed in preprocessing
part. For each Q i H , the algorithm picks up the Star s i with the minimal r s in U i (the set of all Stars in
U corresponding to Q i , U i U ) and put it in the set B . From the set B , the algorithm chooses the Star s
with the minimal r s and adds it to the objective set H k . And then, the algorithm removes Q i and QC i from
H . The algorithm stops when the set H k has k elements. The output is a set of k pairs of the form Q i ,
d Q Q
Q QC
j
s
i
j
i
 
Search WWH ::




Custom Search