Uncertainty in Data Integration and Dataspace Support Platforms - Schema Matching and Mapping

Databases Reference

In-Depth Information

3.3.3

Top-

Query Answering

The main challenge in designing the algorithm for returning top-k query answers is

to only perform the necessary reformulations at every step and halt when the top-k

answers are found. We focus on top-k query answering for by-table semantics, and

the algorithm can be modified for by-tuple semantics.

Recall that in by-table query answering, the probability of an answer is the sum of

the probabilities of the reformulated queries that generate the answer. Our goal is to

reduce the number of reformulated queries we execute. The algorithm we describe

next proceeds in a greedy fashion: it executes queries in descending order of prob-

abilities. For each tuple t, it maintains the upper bound p max .t/ and lower bound

p min .t/ of its probability. This process halts when it finds k tuples whose p min values

are higher than p max of the rest of the tuples.

T OP KB Y T ABLE takes as input an SPJ query Q, a schema p-mapping pM,an

instance D S of the source schema, and an integer k, and outputs the top-k answers

in Q table .D S /. The algorithm p rocee ds in three steps.

Step 1: Rewrite Q according to pM into a set of queries Q 1 ;:::;Q n , each with a

probability assigned in a similar way as stated in Algorithm B Y T ABLE .

Step 2: Execute Q 1 ;:::;Q n in descending order of their probabilities. Maintain the

following measures:

The highest probability, PMax, for the tuples that have not been generated yet.

We initialize PMax to 1; after executing query Q i and updating the list of

answers (see third bullet), we decrease PMax by Pr.Q i /;

The threshold th determining which answers are potentially in the top-k.We

initialize th to 0; after executing Q i and updating the answer list, we set th to

the kth largest p min for tuples in the answer list;

A list L of answers whose p max is no less than th, and bounds p min and p max for

each answer in L. After executing query Q i , we update the list as follows: (1)

for each t 2 L and t 2 Q i .D S /, we increase p min .t/ by Pr.Q i /; (2) for each

t 2 L but t 62 Q i .D S /, we decrease p max .t/ by Pr.Q i /;(3)ifPMax th,for

each t 62 L but t 2 Q i .D S /,insertt to L,setp min to Pr.Q i /,andsetp max .t/

to PMax.

A list T of k tuples with top p min values.

Step 3: When th >PMaxand for each t 62 T , th >p max .t/, halt and return T .

Example 7. Consider Example 1 where we seek for top-1 answer. We answer the

reformulated queries in order of Q 1 ;Q 2 ;Q 3 . After answering Q 1 , for tuple (“Sun-

nyvale”) we have p min D 0:5 and p max D 1, and for tuple (“Mountain View”) we

have the same bounds. In addition, PMax D 0:5 and th D 0:5.

In the second round, we answer Q 2 . Then, for tuple (“Sunnyvale”), we have

p min

D 0:9 and p max

D 1, and for tuple (“Mountain View”), we have p min

D 0:5

and p max D 0:6.NowPMax D 0:1 and th D 0:9.

Because th>PMax and this above the p max for the (“Mountain View”) tuple,

we can halt and return (“Sunnyvale”) as the top-1 answer.

Schema Matching and Mapping

Search WWH ::

Custom Search

Home