Probabilistic Ranking Queries on Uncertain Data - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

To use Poisson approximation to evaluate a top- k query Q k

, we scan the tuples

in T in the ranking order. The sum of membership probabilities of the scanned tuples

is maintained in

(

)

. Moreover, for each generation rule R , let R left be the set of tuples

in R that are already scanned. Correspondingly, let

μ R be the sum of membership

probabilities of the tuples in R left .

When a tuple t is scanned, if t is an independent tuple, then the top- k probability

of t can be estimated using Pr

) Γ ( k , μ)

(

! .If t belongs to a gener-

ation rule R , then the top- k probability of t can be estimated by Pr

(

)

(

−

μ)=

(

−

)

μ )=

(

)

(

−

) Γ ( k , μ )

μ = μ − μ R . t is output if the estimated probability Pr k

passes

the probability threshold p . The scan stops when the general stopping condition in

Theorem 5.9 is satisfied.

In the Poisson approximation based method, we need to maintain the running

(

( k − 1 ) ! , where

(

)

and

μ R for each open rule R . Thus, the space requirement of the Poisson approx-

imation based method is O

( |R| +

)

, where

is the set of generation rules. The

, where n is the number of tuples read before the general

stopping condition is satisfied, which depends on parameter k , probability threshold

p and the probability distribution of the tuples and is independent from the size of

the uncertain table.

n )

time complexity is O

(

5.5 Online Query Answering

Since probabilistic ranking queries involve several parameters, a user may be in-

terested in how query results change as parameters vary. To support the interactive

analysis, online query answering is highly desirable.

In this section, we develop PRist+ (for p robabilistic r anking l ist s), an index for

online answering probabilistic ranking queries on uncertain data, which is compact

in space and efficient in construction.

We first describe the index data structure PRist and present a simple construction

algorithm. Then, we develop PRist+ , an advanced version of PRist that can be con-

structed more efficiently and achieve almost as good performance as PRist in query

answering.

5.5.1 The PRist Index

To answer probabilistic ranking queries, for a tuple t

T , a rank parameter k and a

probability threshold p , we often need to conduct the following two types of check-

ing operations.

∈

•

Top-k probability checking : is the top- k probability of t at least p ?

•

p-rank checking : is the p -rank of t at most k ?

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home