Advanced Techniques - Probabilistic Databases

Database Reference

In-Depth Information

6. ADVANCED TECHNIQUES

For the next lemma and the discussion following it, we will assume without loss of generality

that all intervals

are strict, i.e., L i <U i for i = 1 ,n . To ensure that this holds, we can

increase all upper bounds U i by a small amount ε> 0. Then, the following holds:

[ L i ,U i ]

Lemma 6.1 Denoting c(S) the complement of a set S, the following properties hold:

T k =∅

B k ∩

B k ⊆

c(Top k )

T k =∅

T k ⊆

Top k

Proof. It is easy to see that B k ∩ T k =∅

:if i ∈ B k and i ∈ T k , then U i ≤ L c

≤ L i , implying L i = U i ,

B k ∩ T k =∅

which contradicts our assumption that the intervals are strict.

is similar.

The claim B k ∩

Top k =∅

follows from the fact that every tuple in B k has a probability dom-

T k , and the latter set has at least k tuples. T k ⊆

inated by all tuples in

Top k

is by duality.

When the critical region becomes empty, L c

≥ U c , then we have identified Top k .

T k =

B k =

(1) If the critical region is empty, i.e., L c

U c , then T k =

≥

Top k and B k =

Lemma 6.2

| T k |= k or

| B k |= n − k, then the critical region is empty.

c(Top k ). (2) If

Proof. (1) When L c

≥ U c , then

T k ⊆ T k . By the previous lemma, T k ⊆

Top k . But we also have

| T k |≥

k and

Top k |=

k ; hence, all three sets are equal.

(2) Assume

| T k |= k (the case

| B k |= n − k is dual, and omitted). By the definition of T k ,

T k , we have U c

U c . It follows that L c

for every i

∈

≤

L i ; thus, there are k values L i that are

≥

max k (L 1 ,...,L n ) ≥ U c , proving that the critical region is empty.

Thus, our goal is to simulate the objects G i until the critical region becomes empty or, equiv-

alently, until T k contains k elements, then return T k . Our algorithm will therefore simulate the

objects G 1 ,G 2 ,...,G n only until the critical region becomes empty, then will return Top k = T k .

The question is, which object to choose to simulate at each step.

Call a tuple t i a crosser if L i ≤ L c and U c

≤ U i . If the critical region is non-empty, then a crosser

is any interval that contains the critical region. There are always at least two crossers because every

non-crosser

satisfies either L c <L i

or U i <U c ; there are, at most, k − 1 non-crossers of

[ L i ,U i ]

the first kind (since L c

max k (L 1 ,...,L n ) ), and, at most, n

−

1 ) non-crossers of the second

kind (since U c

= min n − k (U 1 ,...,U n ) ), and these two sets are disjoint when L c

≤ U c ; hence, there

exists at least 2 crossers.

This observation allows us to prove that it is necessary to simulate until the critical region

becomes empty: otherwise, the set Top k

is not uniquely defined.

If the critical region is non-empty, then Top k

is not uniquely defined by the intervals

Lemma 6.3

[ L 1 ,U 1 ] ,..., [ L n ,U n ]

Probabilistic Databases

Search WWH ::

Custom Search

Home