Database Reference
In-Depth Information
U
c
L
c
t
2
t
1
t
3
t
4
t
5
p
0
1
Example
Empty critical region
L
c
U
c
L
c
U
c
L
c
U
c
Case 1
Case 2
Case 3
Figure 6.1:
Illustration of the multi-simulation technique. The goal is to retrieve the top
k
=
2 highest
probabilities. Intervals represent uncertainty about the value of a tuple's probability score.
We assume that during any simulation step, no lower bound
L
i
decreases, and no upper bound
U
i
increases. It is easy to check that this holds for
Algorithm 3
; for the Monte Carlo approximation
algorithms in
Subsection 5.3.2
, while it fails if we define a simulation step to consis
ts
of one single
Monte Carlo trial, the assumption still holds with high probability is we perform
√
N
trials during
any simulation step [
Ré et al.
,
2007
]. This assumption implies immediately that
L
c
never decreases,
and
U
c
never increases. Thus, the critical region
(L
c
,U
c
)
will only shrink after a simulation step, or
stay the same.
Next, we define four sets of tuples as follows:
B
k
={
i
|
U
i
≤
L
c
T
k
={
i
|
U
c
}
≤
L
i
}
B
k
={
i
|
U
i
≤
U
c
T
k
={
i
|
L
c
}
≤
L
i
}
If the critical region is non-empty (
L
c
<U
c
), then
T
k
⊆
T
k
and
B
k
⊆
B
k
; if the critical region
is empty, then the opposite inclusions hold.
It is always the case that
|
T
k
|≥
k
because, by the definition of
L
c
=
max
k
(L
1
,...,L
n
)
,
there exists
k
lower bounds
L
i
such that
L
i
≥
L
c
. By duality, it follows that
|
B
k
|≥
n
−
k
since
U
c
=
max
k
+
1
(U
1
,...,U
n
)
=
min
n
−
k
(U
1
,...,U
n
)
.