Database Reference
In-Depth Information
Example 6.5 (Approximating top-k probability). Consider three streams A, B, and
C and their quantile summaries in window W
( {
A
,
B
,
C
} )
, as shown in Figure 6.2,
where a i ,b i and c i ( 1
i
3 ) are the intervals of W
(
A
)
,W
(
B
)
and W
(
C
)
, respec-
tively. The membership probability of each interval is 3 .
To compute the upper bound of the top- 2 probability of instances falling into b 1 ,
we let all instances in b 1 take the maximum value b 1
.
MAX. Moreover, since intervals
.
a 2 and c 2 cover b 1
MAX, we let all instances in a 2 and c 2 take the minimum value
a 2
.
.
MIN and c 2
MIN, respectively. Thus, the probability that A is ranked better than
1
3 , and the probability that C is ranked better than b 1 is at least
b 1 is at least Pr
(
)=
a 1
1
3 . The upper bound of the top- 2 probability of b 1 is Pr
1
3 ×
1
3 )=
Pr
(
c 1 )=
(
b 1 ) × (
1
8
27 .
1
9 is the lower bound of the top- 2
Using the similar idea, we can verify that
probability of b 1 .
Using the idea illustrated in Example 6.5 and the Poisson binomial recurrence,
we can have the general upper and lower bounds of the top- k probabilities of inter-
vals.
Theorem 6.6 (Upper and lower bounds). To answer a top-k query on sliding win-
dowW
( O )
, for an interval t in a
φ
-quantile summary of W
(
O
)(
O
∈O )
,
k
1
i = 0 Pr ( UDS ( t ) , i )
Pr k
(
t
)
Pr
(
t
)
and
k
1
i = 0 Pr ( LDS ( t ) , i )
Pr k
(
)
(
)
t
Pr
t
where Pr k
)= o W ( O ) , o t Pr k
o |
o
t ,
t is an interval in a
(
(
)
(
)= {
φ
t
o
,UDS
t
-
O ) ,
O =
t .
o |
o
t ,
t
quantile summary of W
(
,
.
}
(
)= {
O
MIN
t
MAX
, and LDS
t
O ) ,
O =
t .
is an interval in a
φ
-quantile summary of W
(
O
,
MAX
t
.
MIN
}
.
k 1
i
k 1
j
Proof. Since UDS
(
t
)
DS
(
t
)
,
1 Pr
(
UDS
(
t
) ,
i
)
1 Pr
(
DS
(
t
) ,
j
)
. Similarly,
=
=
k
1
j = 1 Pr
k
1
i = 1 Pr
DS
(
t
)
LDS
(
t
)
,sowehave
(
DS
(
t
) ,
j
)
(
LDS
(
t
) ,
i
)
. Moreover,
k
1
since Pr k
(
t
)=
Pr
(
t
)
1 Pr
(
DS
(
t
) ,
j
)
, the conclusions hold.
j
=
Using Theorem 6.6, we can easily derive the upper bound and the lower bound
of a stream by summing up the upper/lower bounds of all intervals of the quantile
summary of the stream. Importantly and interestingly, the difference between the
upper bound and the lower bound of the top- k probability is up to 2
φ
, which provides
a strong quality guarantee in approximation.
Pr k
(
))
Theorem 6.7 (Approximation quality). For a stream O,
let
U (
O
and
Pr k
be the upper bound and the lower bound of Pr k
L (
(
))
(
)
O
O
derived from Theo-
Pr k
Pr k
rem 6.6, respectively.
.
Proof. To prove Theorem 6.7, we need the following lemma.
U (
(
O
)) −L (
(
O
))
2
φ
 
Search WWH ::




Custom Search