Database Reference
In-Depth Information
Pr k
Pr k
U (
(
O
)) −L (
(
O
))
2
φ .
O and an interval t
Case 2. If case 1 does not hold, i.e., there is an interval t
MIN . That is, interval t
covers t completely, as illustrated in Figure 6.3(b). In that case,
O such that, t .
MAX and t .
W
(
O
)
MAX
>
t
.
MIN
t
.
O
U (
Pr
(
t i + 1
))
O
O
L (
(
Pr
t i + 1
)) =
Pr
(
t j )=φ
. Comparing to Case 1 where
U (
Pr
(
t i + 1
))
O
L (
(
)) =
(
)+ ... +
(
)=(
, the difference between the
upper bound and the lower bound is smaller. Therefore, in Case 2,
Pr
t i + 1
Pr
t j
Pr
t j + x
x
1
Pr k
U (
(
))
O
Pr k
L (
(
))
O
is even smaller than that in Case 1. The theorem holds.
Pr k
Pr k
For any object O , since
U (
(
O
)) −L (
(
O
))
2
φ
, we can simply use
Pr k
Pr k
U (
(
O
)) −L (
(
O
))
to approximate Pr k
(
O
)
.
2
, let Pr k
Corollary 6.4 (Approximation Quality). For a stream O
W
( O )
(
O
)=
U ( Pr k
( O )) −L ( Pr k
Pr k
( O ))
Pr k
, then
(
O
)
(
O
) φ
.
2
6.3.2 Approximate Quantile Summaries
Although using quantiles we can approximate top- k probabilities well, comput-
ing exact quantiles of streams by a constant number of scans still needs
N p
Ω (
)
space [187]. To reduce the cost in space, we use
-approximate quantile summary
which can still achieve good approximation quality.
ε
Definition 6.5 (
ε
-approximate quantile). Let o 1 ≺···≺
o ω
be the sorted list of
instances in a sliding window W
(
O
)
.An
ε
-approximate
φ
-quantile (0
< φ
1) of
W
(
O
)
is an instance O l where l
[ ε)ω , (φ +ε)ω ]
.
An
ε
-approximate
φ
-quantile summary of W
(
O
)
is o 1 and a list of instances
1
o l 1 ,...,
[ (
φ ε)ω , (
φ +ε)ω ](
φ )
o l 1 / φ , l i
i
i
1
i
.
The
ε
-approximate
φ
-quantile summary of W
(
O
)
partitions the instances of
1
φ
W
(
O
)
into
intervals . The first interval t 1 =[
o 1 ,
o l 1 ]
, and generally the i -th
1
φ )
(
1
<
i
interval t i =(
q i 1 ,
q i ]
.
[(φ
ε)ω , (φ +
ε)ω ]
The number of instances in each interval is in
2
2
. Since the
1
ω
membership probability of each instance is
, the membership probability of each
interval is within
2
ε , φ +
2
ε ]
.
-Approximate quantiles in data streams is well studied [111, 112,
188, 189]. Both deterministic and randomized methods are proposed. In our im-
plementation, we adopt the method of computing approximate quantile summaries
in a sliding window proposed in [188], which is based on the GK-algorithm [112]
that finds the approximate quantile over a data steam. The algorithm can contin-
uously output the
Computing
ε
ε
-approximate quantiles in a sliding window with space cost of
log 2
εω
O
(
)
.
2
ε
 
Search WWH ::




Custom Search