Probabilistic Ranking Queries on Uncertain Data - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

Algorithm 5.3 The exact algorithm with reordering and pruning techniques

Input: an uncertain table T , a set of generation rules

, a top- k query Q k f , and a probability

threshold p

Output: Answer ( Q , p , T )

Method:

1: retrieve tuples in T in the ranking order one by one

2: for all t i ∈ T do

compute T ( t i ) by rule-tuple compression and reordering

compute subset probability values and Pr k

( t i )

if Pr k

( t i ) ≥ p then

6: output t i

7: end if

8: check whether t i can be used to prune future tuples

9: if all remaining tuples in T fail the probability threshold then

10: exit

11: end if

12: end for

In summary, the exact algorithm for PT- k query answering is shown in Algo-

rithm 5.3. We analyze the complexity of the algorithm as follows.

For a multi-tuple rule R : t r 1 ,···,

t r m where t r 1 ,...,

t r m are in the ranking order, let

span

is processed, we need to remove rule-

tuple t r 1 ,..., r l − 1 , and compute the subset probability values of the updated compressed

dominant sets. When the next tuple not involved in R is processed, t r 1 ,..., r l − 1 and t r l

are combined. Thus, in the worst case, each multi-tuple rule causes the computation

of O

(

r m −

r 1 . When tuple t r l (

≤

)

subset probability values. Moreover, in the worst case where each

tuple T passes the probability threshold, all tuples in T have to be read at least once.

The time complexity of the whole algorithm is O

(

2 k

span

(

))

(

))

As indicated by our experimental results, in practice the four pruning rules are ef-

fective. Often, only a very small portion of the tuples in T are retrieved and checked

before the exact answer to a PT- k query is obtained.

Interestingly, since PT- k query answering methods can be extended to evaluate

top-

(

∑ R ∈R span

queries (as discussed in Section 5.2.1), the pruning

techniques introduced in this section can be applied to answering top-

(

)

queries and top-

(

)

(

)

queries

and top-

(

)

queries as well.

5.3 A Sampling Method

One may trade off the accuracy of answers against the efficiency. In this section, we

present a simple yet effective sampling method for estimating top- k probabilities of

tuples.

For a tuple t , let X t be a random variable as an indicator to the event that t

is ranked top- k in possible worlds. X t =

1if t is ranked in the top- k list, and

X t =

0 otherwise. Apparently, the top- k probability of t is the expectation of X t ,

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home