Automatic Categorization of Web Database Query Results - Advanced Database Query Systems

Database Reference

In-Depth Information

Algorithm 3. The Greedy-Refine algorithm of queries clustering

Input: A pruned query history H with m queries: H = { Q 1 ,…, Q m }, a set of all

Stars: U = { ∧ Q i , QC i ∧ | Q i ∧ H , QC i ∧ H }, k

Output: A set of k query clusters H k = { ∧ Q 1 , QC 1 ∧ ... ∧ Q k , QC k ∧ }

1. Let B = {} be a buffer that can hold m ∧ Q i , QC i ∧

2. While H ≠ ∧ and k > 0 Do

3. B ← ∧

4. For each Q i ∧ H Do

5. Pick s i = ∧ Q i , QC i ∧ with minimum r s i from U i = { ∧ Q i , QC i ∧ | QC i ∧ H , | QC i |

= [2, | H | - k + 1]}

6. B ← B +{ s i }

7. End For

8. Pick s = ∧ Q i , QC i ∧ with minimum r s from B

9. H ← H - QC i - { Q i }, H k ← H k + s , k ← k - 1

10. End While

11. Return H k

Algorithmic Solution

For clustering queries, we propose a novel approach, which can discover the near-globally optimal so-

lution and has the low time complexity of the algorithm as well. The approach is described as follows.

Observing the solution of the queries clustering, we can find that every representative query connects

with some other queries of H and these connections are like star structures. Here, we call a connection

as a Star . Then we can re-define the queries clustering problem as follows: Let U be the set of all Stars,

i.e., U = { ∧ Q i , QC i ∧ | Q i ∧ H , QC i ∧ H }. The cost of each Star s = ∧ Q i , QC i ∧ ∧ U can be denoted as:

= ∑ ∈ ( , ) . Let r s = c s /| QC i | be the performance-price ratio. Our objective is to find a set of

Star S , such that S ∧ U , which minimizes the cost and enables that there are k representative queries in

S and any original query Q j ∧ H appears at least once at Star s ∧ S .

For solving this problem, we propose an approach which consists of two parts: a pre-processing part

and a processing part. In the processing part, we build a sequential permutation k i = { Q i 1 , Q i 2 ,..., Q im } over

H for each query Q i ∧ H , where { Q i 1 , Q i 2 ,..., Q im } ∧ H and the queries in k i are arranged non-decreasing

according to their cost corresponding to Q i , that is, d ( Q i , Q i 1 ) ≤ d ( Q i , Q i 2 ) ≤... ≤ d ( Q i , Q im ). Such permuta-

tions can help us only consider the first l queries in k i other than all queries in k i when we build the Star

for Q i . Note that, the number l should be choosed appropriately. It can be seen that the complexity of

pre-processing part is O(| H | 2 log| H |), where | H | denotes the number of queries of H .

The task of processing part is to cluster queries by using the Greedy-Refine algorithm (Algorithm 3)

based on the Stars formed in pre-processing part. The input is a set of all Stars formed in preprocessing

part. For each Q i ∧ H , the algorithm picks up the Star s i with the minimal r s in U i (the set of all Stars in

U corresponding to Q i , U i ∧ U ) and put it in the set B . From the set B , the algorithm chooses the Star s

with the minimal r s and adds it to the objective set H k . And then, the algorithm removes Q i and QC i from

H . The algorithm stops when the set H k has k elements. The output is a set of k pairs of the form ∧ Q i ,

d Q Q

Q QC

Search WWH ::

Custom Search

Home