Database Reference
In-Depth Information
6. ADVANCED TECHNIQUES
For the next lemma and the discussion following it, we will assume without loss of generality
that all intervals
are strict, i.e.,
L
i
<U
i
for
i
=
1
,n
. To ensure that this holds, we can
increase all upper bounds
U
i
by a small amount
ε>
0. Then, the following holds:
[
L
i
,U
i
]
Lemma 6.1
Denoting c(S) the complement of a set S, the following properties hold:
T
k
=∅
B
k
∩
B
k
∩
B
k
⊆
c(Top
k
)
T
k
=∅
T
k
⊆
Top
k
Proof.
It is easy to see that
B
k
∩
T
k
=∅
:if
i
∈
B
k
and
i
∈
T
k
, then
U
i
≤
L
c
≤
L
i
, implying
L
i
=
U
i
,
B
k
∩
T
k
=∅
which contradicts our assumption that the intervals are strict.
is similar.
The claim
B
k
∩
Top
k
=∅
follows from the fact that every tuple in
B
k
has a probability dom-
T
k
, and the latter set has at least
k
tuples.
T
k
⊆
inated by all tuples in
Top
k
is by duality.
When the critical region becomes empty,
L
c
≥
U
c
, then we have identified
Top
k
.
T
k
=
B
k
=
(1) If the critical region is empty, i.e., L
c
U
c
, then T
k
=
≥
Top
k
and B
k
=
Lemma 6.2
|
T
k
|=
k or
|
B
k
|=
n
−
k, then the critical region is empty.
c(Top
k
). (2) If
Proof.
(1) When
L
c
≥
U
c
, then
T
k
⊆
T
k
. By the previous lemma,
T
k
⊆
Top
k
. But we also have
|
T
k
|≥
k
and
|
Top
k
|=
k
; hence, all three sets are equal.
(2) Assume
|
T
k
|=
k
(the case
|
B
k
|=
n
−
k
is dual, and omitted). By the definition of
T
k
,
T
k
, we have
U
c
U
c
. It follows that
L
c
for every
i
∈
≤
L
i
; thus, there are
k
values
L
i
that are
≥
=
max
k
(L
1
,...,L
n
)
≥
U
c
, proving that the critical region is empty.
Thus, our goal is to simulate the objects
G
i
until the critical region becomes empty or, equiv-
alently, until
T
k
contains
k
elements, then return
T
k
. Our algorithm will therefore simulate the
objects
G
1
,G
2
,...,G
n
only until the critical region becomes empty, then will return
Top
k
=
T
k
.
The question is, which object to choose to simulate at each step.
Call a tuple
t
i
a
crosser
if
L
i
≤
L
c
and
U
c
≤
U
i
. If the critical region is non-empty, then a crosser
is any interval that contains the critical region. There are always at least two crossers because every
non-crosser
satisfies either
L
c
<L
i
or
U
i
<U
c
; there are, at most,
k
−
1 non-crossers of
[
L
i
,U
i
]
the first kind (since
L
c
=
max
k
(L
1
,...,L
n
)
), and, at most,
n
−
(k
+
1
)
non-crossers of the second
kind (since
U
c
=
min
n
−
k
(U
1
,...,U
n
)
), and these two sets are disjoint when
L
c
≤
U
c
; hence, there
exists at least 2 crossers.
This observation allows us to prove that it is necessary to simulate until the critical region
becomes empty: otherwise, the set
Top
k
is not uniquely defined.
If the critical region is non-empty, then Top
k
is not uniquely defined by the intervals
Lemma 6.3
[
L
1
,U
1
]
,...,
[
L
n
,U
n
]
.