Continuous Ranking Queries on Uncertain Streams - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

Pr k

U (

(

)) −L (

(

)) ≤

φ .

O and an interval t ∈

Case 2. If case 1 does not hold, i.e., there is an interval t

∈

MIN . That is, interval t

covers t completely, as illustrated in Figure 6.3(b). In that case,

O such that, t .

MAX and t .

(

) −

MAX

MIN

≤

O ≺

U (

(

t i + 1

)) −

O ≺

L (

(

t i + 1

)) =

(

t j )=φ

. Comparing to Case 1 where

U (

(

t i + 1

)) −

O ≺

L (

(

)) =

(

)+ ... +

(

)=(

−

)φ

, the difference between the

upper bound and the lower bound is smaller. Therefore, in Case 2,

t i + 1

t j

t j + x

Pr k

U (

(

)) −

Pr k

L (

(

))

is even smaller than that in Case 1. The theorem holds.

Pr k

For any object O , since

U (

(

)) −L (

(

)) ≤

, we can simply use

Pr k

U (

(

)) −L (

(

))

to approximate Pr k

(

)

, let Pr k

Corollary 6.4 (Approximation Quality). For a stream O

∈

( O )

(

U ( Pr k

( O )) −L ( Pr k

Pr k

( O ))

Pr k

, then

(

) −

(

) ≤ φ

6.3.2 Approximate Quantile Summaries

Although using quantiles we can approximate top- k probabilities well, comput-

ing exact quantiles of streams by a constant number of scans still needs

N p

Ω (

)

space [187]. To reduce the cost in space, we use

-approximate quantile summary

which can still achieve good approximation quality.

Definition 6.5 (

-approximate quantile). Let o 1 ≺···≺

o ω

be the sorted list of

instances in a sliding window W

(

)

.An

-approximate

-quantile (0

< φ ≤

1) of

(

)

is an instance O l where l

∈ [ (φ − ε)ω , (φ +ε)ω ]

-approximate

-quantile summary of W

(

)

is o 1 and a list of instances

o l 1 ,...,

∈ [ (

φ − ε)ω , (

φ +ε)ω ](

≤

φ )

o l 1 / φ , l i

The

-approximate

-quantile summary of W

(

)

partitions the instances of

(

)

into

intervals . The first interval t 1 =[

o 1 ,

o l 1 ]

, and generally the i -th

φ )

(

≤

interval t i =(

q i − 1 ,

q i ]

[(φ −

ε)ω , (φ +

ε)ω ]

The number of instances in each interval is in

. Since the

membership probability of each instance is

, the membership probability of each

interval is within

[φ −

ε , φ +

ε ]

-Approximate quantiles in data streams is well studied [111, 112,

188, 189]. Both deterministic and randomized methods are proposed. In our im-

plementation, we adopt the method of computing approximate quantile summaries

in a sliding window proposed in [188], which is based on the GK-algorithm [112]

that finds the approximate quantile over a data steam. The algorithm can contin-

uously output the

Computing

-approximate quantiles in a sliding window with space cost of

log 2

εω

(

)

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home