Database Reference
In-Depth Information
Inequality 6.6 can be shown similarly.
of a stream, we use U ( Pr k
( O )) − L ( Pr k
( O ))
For a sliding window W
(
O
)
as an ap-
2
proximation of Pr k
(
O
)
.
Theorem 6.9 (Approximation Quality). For a streamO and sliding windowW
(
O
)
,
let Pr k
U ( Pr k
( O )) − L ( Pr k
Pr k
( O ))
Pr k
(
O
)=
, then
(
O
)
(
O
) φ +ε
.
2
Proof. Following with Theorems 6.8 and 6.7, we have
U (
)) − L (
Pr k
Pr k
Pr k
Pr k
(
O
(
O
)) ≤U (
(
O
)) −L (
(
O
))+
2
ε
2
φ +
2
ε
Theorem 6.9 follows with the above inequality directly.
6.3.3 Space Efficient Algorithms using Quantiles
The deterministic algorithm discussed in Section 6.1 and the sampling algorithm
proposed in Section 6.2 can both be extended using approximate quantile sum-
maries. Due to the loss of information in approximate quantile summaries, the ex-
tension of the deterministic algorithm only provides approximate answers.
Using approximate quantile summaries, each stream in a sliding window is rep-
resented by
1
φ
intervals. The upper bound and the lower bound of the top- k proba-
bility of each interval can be computed using either the deterministic method or the
sampling method.
To compute the upper bound and the lower bound using the deterministic method,
we first sort the maximum values and the minimum values of all intervals in the
ranking order. Then, by scanning the sorted list once, we can compute the approxi-
mate upper bound and the approximate lower bound of the top- k probability of each
interval. For each stream O , we maintain the upper bound and the lower bound of
the number of instances in W
(
O
)
that have been scanned, and the upper bound and
the lower bound of Pr k
so far.
The time complexity of query evaluation in a sliding window using the above
extended algorithm is O
(
O
)
kn 2
φ +
n
φ
n 1
(
log
(
φ ))
. The upper bound of approximation error
.
To compute the upper bound and the lower bound using the sampling method,
we draw m sample units uniformly at random as described in Section 6.2. The dif-
ference is, each sample unit contains an interval from each stream. For each interval
t of stream O , we define an indicator X t U
is
φ +ε
t |
t is an interval of
to the event that
{
O ,
O =
t .
t |
t
O
,
MIN
>
t
.
MAX
} <
k , and an indicator X t L
to the event that
{
is an interval of O ,
O =
t .
k . The indicator is set to 1 if the
event happens; otherwise, the indicator is set to 0. Then,
O
,
MAX
>
t
.
MIN
} <
Pr k
U (
(
t
)) =
E
[
X t U ]
, and
Pr k
is X t U
L (
(
t
)) =
E
[
X t L ]
. Suppose in sample unit s , the value of X t U
, then the ex-
i = 1 X s i
1
m
m
pectation E
[
X t U ]
can be estimated by
. Similarly, E
[
X t L ]
can be estimated
t
U
1 X s i
1
m
i
by
.
=
t
L
 
Search WWH ::




Custom Search