Probabilistic Ranking Queries on Uncertain Data - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

In a Poisson binomial distribution X , the probability density of X is unimodal

(i.e., first increasing then decreasing), and attains its maximum at

μ =

[

]

[183].

Therefore, when the query parameter k varies from 1 to

(

) | +

1, Pr

(

)

follows

the similar trend.

Corollary 5.3 (Distribution of rank- k probability). For a tuple t

∈

⎧

⎨

if k

(

) | +

1 ;

1. Pr

(

)

(

) ,

if k

≤ μ −

1 ;

⎩

(

) ,

if k

≥ μ

2. arg max | T ( t ) | + 1

j =

t )

(

)=μ +

where

μ = ∑ t ∈ T ( t ) Pr

(

5.4.2 A General Stopping Condition

Corollary 5.3 shows that, given a tuple t and its compressed dominant set T

(

)

the most possible ranks of t are around

1, then

the top- k probability of t is small. Now, let us use this property to derive a general

stopping condition for query answering algorithms progressively reading tuples in

the ranking order. That is, once the stopping condition holds, all unread tuples that

cannot satisfy the query can be pruned. The stopping condition is independent from

the number of tuples in the data set, and dependent on only the query parameter k

and the probability threshold p .

μ +

1. In other words, if k

μ +

Theorem 5.9 (A General Stopping Condition). Given a top-k query Q k

(

)

and

t )

. Then, Pr k

probability thresh old p, for a tup le t

∈

T , let

μ = ∑ t ∈ T ( t ) Pr

(

) <

pif

ln 2 p +

2 k ln p .

Proof. To prove Theorem 5.9, we need Theorem 4.2 in [184].

ln p +

μ ≥

Lemma 5.1 (Chernoff Bound of Poisson Trials [184]). Let X 1

,...,

X n be indepen-

≤

[

dent Poisson trials such that, for 1

n, Pr

X i

p i , where 0

p i

1 . Then,

i = 1 X i ,

i = 1 p i , and 0

for X

= ∑

μ =

[

]= ∑

< ε ≤

1 , we have

e − με 2

[

< (

− ε)μ ] <

As discussed in Section 5.4.1, we can construct a set of Poisson trials correspond-

ing to the tuples in T

such that, for each tuple or rule-tuple t ∈

(

)

(

)

, there is a

t )

corresponding trial whose success probability is the same as Pr

(

. Moreover,

j = 0 Pr ( T ( t ) , j )= Pr [ X < k ]

−

For 0

< ε ≤

1, inequality Pr

[

] ≤

[

< (

− ε)μ ]

holds when

≤ (

− ε)μ

(5.1)

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home