Database Reference
In-Depth Information
Example 6.5 (Approximating top-k probability). Consider three streams A, B, and
C and their quantile summaries in window W
(
{
A
,
B
,
C
}
)
, as shown in Figure 6.2,
where a
i
,b
i
and c
i
(
1
≤
i
≤
3
) are the intervals of W
(
A
)
,W
(
B
)
and W
(
C
)
, respec-
tively. The membership probability of each interval is
3
.
To compute the upper bound of the top-
2
probability of instances falling into b
1
,
we let all instances in b
1
take the maximum value b
1
.
MAX. Moreover, since intervals
.
a
2
and c
2
cover b
1
MAX, we let all instances in a
2
and c
2
take the minimum value
a
2
.
.
MIN and c
2
MIN, respectively. Thus, the probability that A is ranked better than
1
3
, and the probability that C is ranked better than b
1
is at least
b
1
is at least Pr
(
)=
a
1
1
3
. The upper bound of the top-
2
probability of b
1
is Pr
1
3
×
1
3
)=
Pr
(
c
1
)=
(
b
1
)
×
(
1
−
8
27
.
1
9
is the lower bound of the top-
2
Using the similar idea, we can verify that
probability of b
1
.
Using the idea illustrated in Example 6.5 and the Poisson binomial recurrence,
we can have the general upper and lower bounds of the top-
k
probabilities of inter-
vals.
Theorem 6.6 (Upper and lower bounds).
To answer a top-k query on sliding win-
dowW
(
O
)
, for an interval t in a
φ
-quantile summary of W
(
O
)(
O
∈O
)
,
k
1
i
=
0
Pr
(
UDS
(
t
)
,
i
)
−
Pr
k
(
t
)
≤
Pr
(
t
)
and
k
1
i
=
0
Pr
(
LDS
(
t
)
,
i
)
−
Pr
k
(
)
≥
(
)
t
Pr
t
where Pr
k
)=
∑
o
∈
W
(
O
)
,
o
∈
t
Pr
k
o
|
o
∈
t
,
t
is an interval in a
(
(
)
(
)=
{
φ
t
o
,UDS
t
-
O
)
,
O
=
t
.
o
|
o
∈
t
,
t
quantile summary of W
(
,
≥
.
}
(
)=
{
O
MIN
t
MAX
, and LDS
t
O
)
,
O
=
t
.
is an interval in a
φ
-quantile summary of W
(
O
,
MAX
≥
t
.
MIN
}
.
k
−
1
i
k
−
1
j
Proof.
Since
UDS
(
t
)
⊆
DS
(
t
)
,
1
Pr
(
UDS
(
t
)
,
i
)
≥
∑
1
Pr
(
DS
(
t
)
,
j
)
. Similarly,
∑
=
=
k
−
1
j
=
1
Pr
k
−
1
i
=
1
Pr
DS
(
t
)
⊆
LDS
(
t
)
,sowehave
(
DS
(
t
)
,
j
)
≥
∑
(
LDS
(
t
)
,
i
)
. Moreover,
∑
k
−
1
since
Pr
k
(
t
)=
Pr
(
t
)
∑
1
Pr
(
DS
(
t
)
,
j
)
, the conclusions hold.
j
=
Using Theorem 6.6, we can easily derive the upper bound and the lower bound
of a stream by summing up the upper/lower bounds of all intervals of the quantile
summary of the stream. Importantly and interestingly, the difference between the
upper bound and the lower bound of the top-
k
probability is up to 2
φ
, which provides
a strong quality guarantee in approximation.
Pr
k
(
))
Theorem 6.7 (Approximation quality).
For a stream O,
let
U
(
O
and
Pr
k
be the upper bound and the lower bound of Pr
k
L
(
(
))
(
)
O
O
derived from Theo-
Pr
k
Pr
k
rem 6.6, respectively.
.
Proof.
To prove Theorem 6.7, we need the following lemma.
U
(
(
O
))
−L
(
(
O
))
≤
2
φ