Databases Reference
In-Depth Information
3.3.3
Top-
k
Query Answering
The main challenge in designing the algorithm for returning top-k query answers is
to only perform the necessary reformulations at every step and halt when the top-k
answers are found. We focus on top-k query answering for by-table semantics, and
the algorithm can be modified for by-tuple semantics.
Recall that in by-table query answering, the probability of an answer is the sum of
the probabilities of the reformulated queries that generate the answer. Our goal is to
reduce the number of reformulated queries we execute. The algorithm we describe
next proceeds in a greedy fashion: it executes queries in descending order of prob-
abilities. For each tuple t, it maintains the upper bound p max .t/ and lower bound
p min .t/ of its probability. This process halts when it finds k tuples whose p min values
are higher than p max of the rest of the tuples.
T OP KB Y T ABLE takes as input an SPJ query Q, a schema p-mapping pM,an
instance D S of the source schema, and an integer k, and outputs the top-k answers
in Q table .D S /. The algorithm p rocee ds in three steps.
Step 1: Rewrite Q according to pM into a set of queries Q 1 ;:::;Q n , each with a
probability assigned in a similar way as stated in Algorithm B Y T ABLE .
Step 2: Execute Q 1 ;:::;Q n in descending order of their probabilities. Maintain the
following measures:
The highest probability, PMax, for the tuples that have not been generated yet.
We initialize PMax to 1; after executing query Q i and updating the list of
answers (see third bullet), we decrease PMax by Pr.Q i /;
The threshold th determining which answers are potentially in the top-k.We
initialize th to 0; after executing Q i and updating the answer list, we set th to
the kth largest p min for tuples in the answer list;
A list L of answers whose p max is no less than th, and bounds p min and p max for
each answer in L. After executing query Q i , we update the list as follows: (1)
for each t 2 L and t 2 Q i .D S /, we increase p min .t/ by Pr.Q i /; (2) for each
t 2 L but t 62 Q i .D S /, we decrease p max .t/ by Pr.Q i /;(3)ifPMax th,for
each t 62 L but t 2 Q i .D S /,insertt to L,setp min to Pr.Q i /,andsetp max .t/
to PMax.
A list T of k tuples with top p min values.
Step 3: When th >PMaxand for each t 62 T , th >p max .t/, halt and return T .
Example 7. Consider Example 1 where we seek for top-1 answer. We answer the
reformulated queries in order of Q 1 ;Q 2 ;Q 3 . After answering Q 1 , for tuple (“Sun-
nyvale”) we have p min D 0:5 and p max D 1, and for tuple (“Mountain View”) we
have the same bounds. In addition, PMax D 0:5 and th D 0:5.
In the second round, we answer Q 2 . Then, for tuple (“Sunnyvale”), we have
p min
D 0:9 and p max
D 1, and for tuple (“Mountain View”), we have p min
D 0:5
and p max D 0:6.NowPMax D 0:1 and th D 0:9.
Because th>PMax and this above the p max for the (“Mountain View”) tuple,
we can halt and return (“Sunnyvale”) as the top-1 answer.
t
 
Search WWH ::




Custom Search