Continuous Ranking Queries on Uncertain Streams - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

Inequality 6.6 can be shown similarly.

of a stream, we use U ( Pr k

( O )) − L ( Pr k

( O ))

For a sliding window W

(

)

as an ap-

proximation of Pr k

(

)

Theorem 6.9 (Approximation Quality). For a streamO and sliding windowW

(

)

let Pr k

U ( Pr k

( O )) − L ( Pr k

Pr k

( O ))

Pr k

(

, then

(

) −

(

) ≤ φ +ε

Proof. Following with Theorems 6.8 and 6.7, we have

U (

)) − L (

Pr k

(

)) ≤U (

(

)) −L (

(

))+

ε ≤

φ +

Theorem 6.9 follows with the above inequality directly.

6.3.3 Space Efficient Algorithms using Quantiles

The deterministic algorithm discussed in Section 6.1 and the sampling algorithm

proposed in Section 6.2 can both be extended using approximate quantile sum-

maries. Due to the loss of information in approximate quantile summaries, the ex-

tension of the deterministic algorithm only provides approximate answers.

Using approximate quantile summaries, each stream in a sliding window is rep-

resented by

intervals. The upper bound and the lower bound of the top- k proba-

bility of each interval can be computed using either the deterministic method or the

sampling method.

To compute the upper bound and the lower bound using the deterministic method,

we first sort the maximum values and the minimum values of all intervals in the

ranking order. Then, by scanning the sorted list once, we can compute the approxi-

mate upper bound and the approximate lower bound of the top- k probability of each

interval. For each stream O , we maintain the upper bound and the lower bound of

the number of instances in W

(

)

that have been scanned, and the upper bound and

the lower bound of Pr k

so far.

The time complexity of query evaluation in a sliding window using the above

extended algorithm is O

(

)

kn 2

φ +

n 1

(

log

(

φ ))

. The upper bound of approximation error

To compute the upper bound and the lower bound using the sampling method,

we draw m sample units uniformly at random as described in Section 6.2. The dif-

ference is, each sample unit contains an interval from each stream. For each interval

t of stream O , we define an indicator X t U

φ +ε

t |

t is an interval of

to the event that

{

O ,

O =

t .

t |

MIN

MAX

} <

k , and an indicator X t L

to the event that

{

is an interval of O ,

O =

t .

k . The indicator is set to 1 if the

event happens; otherwise, the indicator is set to 0. Then,

MAX

MIN

} <

Pr k

U (

(

)) =

[

X t U ]

, and

Pr k

is X t U

L (

(

)) =

[

X t L ]

. Suppose in sample unit s , the value of X t U

, then the ex-

i = 1 X s i

m ∑

pectation E

[

X t U ]

can be estimated by

. Similarly, E

[

X t L ]

can be estimated

1 X s i

∑

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home