Probabilistic Ranking Queries on Uncertain Data - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

Anaıve method of answering probabilistic ranking queries is to enumerate all

possible worlds and apply the query to each possible world. Then, we can compute

the top- k probability and p -rank of each tuple and select the tuples satisfying the

queries. Unfortunately, the naıve method is inefficient since, as discussed before,

there can be a huge number of possible worlds on an uncertain table. In [49], Dalvi

and Suciu showed that even the problem of counting all possible worlds with distinct

top- k lists is # P -Complete. Therefore, enumerating all possible worlds is too costly

on large uncertain data sets. That motivates our development of efficient algorithms

which avoid searching all possible worlds.

In this chapter, we first discuss the efficient top- k probability computation in

Section 5.1 and present an efficient exact algorithm in Section 5.2. Then, we de-

velop a fast sampling algorithm and a Poisson approximation based algorithm in

Sections 5.3 and 5.4, respectively. Last, to support efficient online query answering,

we propose PRist+ , a compact index, in Section 5.5. An efficient index construc-

tion algorithm and efficient query answering methods are developed for PRist+ .An

empirical study in Section 5.6 using real and synthetic data sets verifies the effec-

tiveness of probabilistic ranking queries and the efficiency of our methods.

5.1 Top- k Probability Computation

In this section, we first introduce how to compute the exact rank- k probability val-

ues. Top- k probabilities and p -ranks can be directly derived from rank- k probabili-

ties.

5.1.1 The Dominant Set Property

Hereafter, by default we consider a top- k selection query Q P , f on an uncertain table

T . P

(

)= {

∈

∧

(

}

true

is the set of tuples satisfying the query predicate.

(

)

(

)

carries the same member-

ship probability as in T . Moreover, a generation rule R in T is projected to P

is also an uncertain table where each tuple in P

(

)

removing all tuples from R that are not in P ( T )

contains all tuples satisfying the query, as well as the membership proba-

bilities and the generation rules. Removing tuples not in P

(

)

(

)

does not affect the

rank- k probabilities of the tuples in P

)

in computing rank- k probabilities. To keep our discuss simple, we use T to denote

the set of tuples satisfying query predicate P .

For a tuple t

(

)

. Therefore, we only need to consider P

(

∈

T and a possible world W such that t

∈

W , whether t

∈

W f (

)

depends only on how many other tuples in T ranked higher than t appear in W .

Definition 5.1 (Dominant set). Given a scoring function f on a probabilistic table

T , for a tuple t

T , the dominant set of t is the subset of tuples in T that are ranked

higher than t , i.e., S t = {

∈

t |

t ∈

t ≺ f t

∧

}

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home