Probabilistic Ranking Queries on Uncertain Data - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

whose top- k probability values fail the threshold. Any tuples in R that have not been

seen should be tested against this largest membership probability.

Our last pruning rule is based on the observation that the sum of the top- k prob-

ability values of all tuples is exactly k . That is

T Pr k

∑

(

k .

∈

Theorem 5.6 (Pruning by total top- k probability). Let A be a set of tuples whose

top-k probability values have been computed. If

∑ t ∈ A Pr k

(

) >

−

p, then for every

tuple t ∈

A, Pr k

t ) <

(

Moreover, we have a tight stopping condition as follows.

Theorem 5.7 (A tight stopping condition). Let t 1

,...,

t m

,...,

t n be the tuples in the

ranking order. Assume L

t 1 ,...,

t m are read. Let LR be the set of open rules with

respect to t m + 1 . For any tuple t i (

)

j = 0 Pr ( L , j ) ;

−

1. if t i is not in any rule in LR, the top-k probability of t i Pr k

(

t i

) ≤

2. if

t i

rule

LR,

the

top-k

probability

t i

k − 1

j = 0 Pr ( L − t R left , j ) .

Pr k

(

t i ) ≤

max

LR (

−

(

t R left ))

∈

Proof. For item (1), consider the compressed dominant set T

(

t i )

of t i . L

⊆

(

t i )

Therefore,

j = 0 Pr ( L , j ) .

The equality holds if tuple t m + 1 is independent with membership probability 1.

For item (2), suppose t i is involved in an open rule R

j = 0 Pr ( T ( t i ) , j ) ≤

−

Pr k

(

t i )=

(

t i )

∈

LR . Pr

(

t i ) ≤

−

(

t R left )

Moreover, for the compressed dominant set T

(

t i )

of t i ,

(

−

t R left ) ⊆

(

t i )

. Therefore,

j = 0 Pr ( T ( t i ) , j ) ≤ ( 1 − Pr ( t R left ))

−

j = 0 Pr ( L − t R left , j )

−

Pr k

(

t i )=

(

t i )

The equality holds when tuple t m + 1 is involved in rule R with membership prob-

ability 1

−

(

t R left )

, where

j = 0 Pr ( L − t R left , j ) .

−

R =

R ∈ LR (

−

(

t R left ))

arg max

Theorem 5.7 provides two upper bounds for tuples that have not been seen

yet. If the upper bounds are both lower than the probability threshold p , then the

unseen tuples do not need to be checked. The two bounds are both tight: Con-

clusion 1 can be achieved if Pr

(

t 1

1, while Conclusion 2 can be achieved if

∈

(

t R left )

(

{

−

(

t R left ) }

t i

arg min R ∈ LR Pr

and Pr

t i

max R ∈ LR

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home