Database Reference
In-Depth Information
6. ADVANCED TECHNIQUES
For the next lemma and the discussion following it, we will assume without loss of generality
that all intervals
are strict, i.e., L i <U i for i = 1 ,n . To ensure that this holds, we can
increase all upper bounds U i by a small amount ε> 0. Then, the following holds:
[ L i ,U i ]
Lemma 6.1 Denoting c(S) the complement of a set S, the following properties hold:
T k =∅
B k
B k
B k
c(Top k )
T k =∅
T k
Top k
Proof. It is easy to see that B k T k =∅
:if i B k and i T k , then U i L c
L i , implying L i = U i ,
B k T k =∅
which contradicts our assumption that the intervals are strict.
is similar.
The claim B k
Top k =∅
follows from the fact that every tuple in B k has a probability dom-
T k , and the latter set has at least k tuples. T k
inated by all tuples in
Top k
is by duality.
When the critical region becomes empty, L c
U c , then we have identified Top k .
T k =
B k =
(1) If the critical region is empty, i.e., L c
U c , then T k =
Top k and B k =
Lemma 6.2
| T k |= k or
| B k |= n k, then the critical region is empty.
c(Top k ). (2) If
Proof. (1) When L c
U c , then
T k T k . By the previous lemma, T k
Top k . But we also have
| T k |≥
k and
|
Top k |=
k ; hence, all three sets are equal.
(2) Assume
| T k |= k (the case
| B k |= n k is dual, and omitted). By the definition of T k ,
T k , we have U c
U c . It follows that L c
for every i
L i ; thus, there are k values L i that are
=
max k (L 1 ,...,L n ) U c , proving that the critical region is empty.
Thus, our goal is to simulate the objects G i until the critical region becomes empty or, equiv-
alently, until T k contains k elements, then return T k . Our algorithm will therefore simulate the
objects G 1 ,G 2 ,...,G n only until the critical region becomes empty, then will return Top k = T k .
The question is, which object to choose to simulate at each step.
Call a tuple t i a crosser if L i L c and U c
U i . If the critical region is non-empty, then a crosser
is any interval that contains the critical region. There are always at least two crossers because every
non-crosser
satisfies either L c <L i
or U i <U c ; there are, at most, k 1 non-crossers of
[ L i ,U i ]
the first kind (since L c
=
max k (L 1 ,...,L n ) ), and, at most, n
(k
+
1 ) non-crossers of the second
kind (since U c
= min n k (U 1 ,...,U n ) ), and these two sets are disjoint when L c
U c ; hence, there
exists at least 2 crossers.
This observation allows us to prove that it is necessary to simulate until the critical region
becomes empty: otherwise, the set Top k
is not uniquely defined.
If the critical region is non-empty, then Top k
is not uniquely defined by the intervals
Lemma 6.3
[ L 1 ,U 1 ] ,..., [ L n ,U n ]
.
Search WWH ::




Custom Search