Database Reference
In-Depth Information
Then, how can we compute the upper bound and the lower bound of the top-
k
probability of a stream in a sliding window using its
ε
-approximate
φ
-quantile
summaries?
Consider an interval
t
i
=(
o
i
−
1
,
o
i
]
in an
ε
-approximate
φ
-quantile summary. Sup-
pose
R
are the actual rank of
o
i
−
1
and
o
i
, respectively. Then, the
actual number of instances in
t
i
is
R
(
o
i
−
1
)
and
R
(
o
i
)
(
o
i
)
−
R
(
o
i
−
1
)
. The membership probability of
R
(
o
i
)
−
R
(
o
i
−
1
)
t
i
is
Pr
(
t
i
)=
. However, since
o
i
−
1
and
o
i
are approximations of the
ω
(
−
)
φ
-quantiles, respectively, their actual ranks are not calculated. In-
stead, we use
Pr
i
1
- and
i
φ
(
t
)=φ
to approximate the membership probability of
t
i
. Since
((
i
−
1
)φ
−
ε)ω
≤
R
(
o
i
−
1
)
≤
((
i
−
1
)φ +ε)ω
, and
(
i
φ
−
ε)ω
≤
R
(
o
i
)
≤
(
i
φ +ε)ω
,
)
−
Pr
we have
.
Then, we compute the upper bound and the lower bound of
Pr
k
Pr
(
t
(
t
)
≤
2
ε
(
t
)
, denoted by
U
(
and
L
(
Pr
k
Pr
k
(
t
))
(
t
))
, respectively, using the approximate membership proba-
bility
Pr
(
)
, following with Theorem 6.6. In sequel, we can further derive the upper
bound and the lower bound of a stream by summing up the upper bound and the
lower bound of all intervals, respectively. The above approximation method has the
following quality guarantee.
t
Theorem 6.8 (Approximation quality).
Given a stream O in a sliding window W,
let
U
(
and
L
(
Pr
k
Pr
k
be the upper and lower bounds of Pr
k
(
))
(
))
(
)
O
O
O
computed
using the
ε
φ
(
)
-approximate
-quantile summary of W
O
, then,
U
(
Pr
k
≤
ε
Pr
k
(
O
))
−U
(
(
O
))
(6.5)
and
L
(
≤
ε
Pr
k
Pr
k
(
O
))
−L
(
(
O
))
(6.6)
Proof.
Consider an
O
. To analyze the approximation
error introduced by
o
i
, we first assume that other quantiles
o
j
(1
ε
-approximate
i
φ
-quantile
o
i
∈
1
φ
≤
j
≤
,
j
=
i
)
are exact. Suppose the real rank of
o
i
is
R
(
o
i
)
, according to the definition of
ε
-
approximate quantile, we have
(
i
φ
−
ε)ω
≤
R
(
o
i
)
≤
(
i
φ +ε)ω
.
are two intervals partitioned by
o
i
. The approx-
imate numbers of instances in
t
i
and
t
i
+
1
are both
t
i
=(
o
i
−
1
,
o
i
]
and
t
i
+
1
=(
o
i
,
o
i
+
1
]
φω
.
If
R
(
o
i
)
<
φ
, then the actual number of instances in
t
i
is
R
(
o
i
)
−
(φ
−
1
)ω
<
φω
, and the actual number of instances in
t
i
+
1
is
(φ +
1
)ω
−
R
(
o
i
)
>
φω
. That is,
there are
φω
−
R
(
o
i
)
≤
εω
instances that are actually in
t
i
+
1
, but are counted into
t
i
due to the
ε
-approximate quantile
o
i
. Thus, the error introduced by
o
i
is at most
φ
(
U
(
Pr
k
Pr
k
(
t
i
))
−U
(
(
t
i
+
1
)))
. Similarly, if
R
(
o
i
)
<
φ
, the error introduced by
φ
(
U
(
Pr
k
Pr
k
o
i
is at most
(
t
i
+
1
))
−U
(
(
t
i
)))
.
Generally,
the
maximum
overall
approximation
error
introduced
by
ε
-approximate quantiles
o
1
,...,
o
1
φ
−
1
is
U
(
= ∑
φ
(
U
(
1
φ
−
1
i
Pr
k
Pr
k
Pr
k
Pr
k
(
O
))
−U
(
(
O
))
(
t
i
))
−U
(
(
t
i
+
1
)))
=
1
≤
φ
(
U
(
Pr
k
Pr
k
(
t
1
))
−U
(
(
t
1
φ
)))
≤
ε