Database Reference
In-Depth Information
Query Refinement and Clustering
Query Prune
The query pruning algorithm is based on the following heuristics: (i) queries with empty answers are
not useful, (ii) in the same session, a user often starts with a query with general conditions and return
many answers, and then continuously refines the previous query until the query returns a few interesting
answers. Therefore, only the last query in such a refinement sequence is important. The queries prune
algorithm is shown in Algorithm 2.
We identify the relationship “ ”between queries by using the following method. Let Q i 's condition
on attribute A i ( i = 1,…, m ) is a i 1 A i a i 2 , and Q j 's condition on attribute A i is a' i 1 A i a' i 2 ,. Q i Q j if
for every condition in Q i , Q j either does not contain any condition on A i or has a condition a' i 1 A i
a' i 2 , such that a' i 1 a i 1 A i a i 2 a' i 2 ,. For simplicity, we use H to denote the pruned query history H '
in the following sections.
Query Clustering
Since there are too many queries in the query history, we should cluster the similar queries into the same
cluster and find the representative queries.
The Problem of Queries Clustering
In order to quantify the similarity between the query Q 1 and Q 2 , we adopt a typical definition of similar-
ity, the cosine similarity. For this to be defined we first need to form the vector representations of query
Q 1 and Q 2 . Consider the set Δ of all distinct <attribute, attribute-value> pairs appearing in the D , that is,
Δ = {< A i , a i > | i {1,…, d } and a Dom( A i )}. Since Dom( A i ) is the active domain of attribute A i the
cardinality of this set is finite. Let it be N = |Δ| and let O D be an arbitrary but fixed order on the pairs ap-
pearing in Δ. We refer to the i i-th element of Δ based on the ordering O D by Δ[ i ]. A vector representation
of query Q 1 = j m ( A j θ a j ) is a binary vector V Q 1 of size N . The i i-th element of the vector corresponds to
pair Δ[ i ]. If Δ[ i ] is contained in the conjunctions of Q 1 then V Q 1 [ i ] = 1. Otherwise it is 0. Analogously,
the vector representation of a query Q 2 is a binary vector V Q 2 of size N . The i i-th element of the vector
corresponds to pair Δ[ i ]. If Δ[ i ] is contained in the conjuncts of Q 2 , then V Q 2 [ i ] = 1; otherwise it is 0.
Algorithm 2. The queries pruning algorithm
Input : query history H , database D
Output : pruned query history H '
1. Eliminate queries with empty answers by executing the query over D .
2. Order the remaining queries by session ID in chronological order.
3. For each sequence ( Q i , U i , F i ), …, ( Q j , U j , F j ) in H such that U i = U i +1 = … = U j
and Q i Q i +1 Q j and ( Q j ∧∧ Q j +1 or U j U j +1 ).
4. Eliminate all queries Q i ,…, Q j -1 .
Search WWH ::




Custom Search