Database Reference
In-Depth Information
Pr
k
Pr
k
U
(
(
O
))
−L
(
(
O
))
≤
2
φ
.
O
and an interval
t
∈
Case 2.
If case 1 does not hold, i.e., there is an interval
t
∈
MIN
. That is, interval
t
covers
t
completely, as illustrated in Figure 6.3(b). In that case,
O
such that,
t
.
MAX
and
t
.
W
(
O
)
−
MAX
>
t
.
MIN
≤
t
.
O
≺
U
(
Pr
(
t
i
+
1
))
−
O
≺
O
≺
L
(
(
Pr
t
i
+
1
)) =
Pr
(
t
j
)=φ
. Comparing to Case 1 where
U
(
Pr
(
t
i
+
1
))
−
O
≺
L
(
(
)) =
(
)+
...
+
(
)=(
−
)φ
, the difference between the
upper bound and the lower bound is smaller. Therefore, in Case 2,
Pr
t
i
+
1
Pr
t
j
Pr
t
j
+
x
x
1
Pr
k
U
(
(
))
−
O
Pr
k
L
(
(
))
O
is even smaller than that in Case 1. The theorem holds.
Pr
k
Pr
k
For any object
O
, since
U
(
(
O
))
−L
(
(
O
))
≤
2
φ
, we can simply use
Pr
k
Pr
k
U
(
(
O
))
−L
(
(
O
))
to approximate
Pr
k
(
O
)
.
2
, let Pr
k
Corollary 6.4 (Approximation Quality).
For a stream O
∈
W
(
O
)
(
O
)=
U
(
Pr
k
(
O
))
−L
(
Pr
k
Pr
k
(
O
))
Pr
k
, then
(
O
)
−
(
O
)
≤
φ
.
2
6.3.2 Approximate Quantile Summaries
Although using quantiles we can approximate top-
k
probabilities well, comput-
ing exact quantiles of streams by a constant number of scans still needs
N
p
Ω (
)
space [187]. To reduce the cost in space, we use
-approximate quantile summary
which can still achieve good approximation quality.
ε
Definition 6.5 (
ε
-approximate quantile).
Let
o
1
≺···≺
o
ω
be the sorted list of
instances in a sliding window
W
(
O
)
.An
ε
-approximate
φ
-quantile
(0
<
φ
≤
1) of
W
(
O
)
is an instance
O
l
where
l
∈
[
(φ
−
ε)ω
,
(φ +ε)ω
]
.
An
ε
-approximate
φ
-quantile summary
of
W
(
O
)
is
o
1
and a list of instances
1
o
l
1
,...,
∈
[
(
φ
−
ε)ω
,
(
φ +ε)ω
](
≤
≤
φ
)
o
l
1
/
φ
,
l
i
i
i
1
i
.
The
ε
-approximate
φ
-quantile summary of
W
(
O
)
partitions the instances of
1
φ
W
(
O
)
into
intervals
. The first interval
t
1
=[
o
1
,
o
l
1
]
, and generally the
i
-th
1
φ
)
(
1
<
i
≤
interval
t
i
=(
q
i
−
1
,
q
i
]
.
[(φ
−
ε)ω
,
(φ +
ε)ω ]
The number of instances in each interval is in
2
2
. Since the
1
ω
membership probability of each instance is
, the membership probability of each
interval is within
[φ
−
2
ε
,
φ +
2
ε ]
.
-Approximate quantiles in data streams is well studied [111, 112,
188, 189]. Both deterministic and randomized methods are proposed. In our im-
plementation, we adopt the method of computing approximate quantile summaries
in a sliding window proposed in [188], which is based on the GK-algorithm [112]
that finds the approximate quantile over a data steam. The algorithm can contin-
uously output the
Computing
ε
ε
-approximate quantiles in a sliding window with space cost of
log
2
εω
O
(
)
.
2
ε