Probabilistic Ranking Queries on Uncertain Data - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

In the U -list of the prob-interval containing the probability threshold, finding the

tuples whose upper ranks are less than or equal to k requires O

time, where n

is the number of tuples in T . Similarly, in the L -list, finding the tuples whose lower

ranks are larger than k also takes O

(

log n

)

time. Let d be the number of tuples that

cannot be pruned or validated. Computing the top- k probabilities of those tuples

requires O

(

log n

)

(

kmd

)

time, where m is the number of rules in T .

5.5.2.2 Answering Top-

(

)

Queries

Example 5.7 (Answering Top-

queries). Consider the uncertain tuples in Ta-

ble 2.2 which are indexed in the PRist in Figure 5.3. To answer a top-

(

)

(

)

with

2 , we scan each prob-interval in Figure 5.3(a) from top down.

In prob-interval b 5 =(

3 and l

2 , which means Pr 2

]

, we find t 4 .

L 5 =

(

t 4 ) ≤

and Pr 3

3 , which means Pr 4

(

t 4 ) >

8 . At the same time, t j .

L 5 =

4 for 1

≤

(

t j ) ≤

3 . Since for any tuple t, we have Pr 3

Pr 4

8 for 1

≤

(

) ≤

(

)

, it is clear that

Pr 3

3 ). Thus, t 4 is one of the answers to the query.

Using the similar procedure, we scan prob-interval b 4 and find that t 3 is another

answer. Since the query asks for the top- 2 results, the query answering procedure

stops.

(

t 4 ) <

(

t j )

( 1

≤

query Q , we want to scan the tuples in the descending order

of their top- k probabilities. However, PRist does not store any exact top- k probabil-

ities. We scan the prob-intervals in the top-down manner instead. For each prob-

interval, we retrieve the tuples whose top- k probabilities lie in the prob-interval.

Obviously, for two prob-intervals b i and b j ( i

To answer a top-

(

)

j ), the top- k probabilities falling in

b i is always greater than the top- k probabilities in b j .

The query answering algorithm proceeds in three steps.

Step 1: Bounding For a prob-interval b i and a tuple t , we use Corollary 5.4 to

determine if the top- k probability of t lies in b i .

Step 2: Pruning and Validating We scan the prob-intervals from top down, and

use a buffer B to store the tuples whose top- k probabilities lie in the scanned prob-

intervals. The tuples not in B have smaller top- k probabilities than those in B .A

tuple may be pruned or validated according to whether it is in B and the number of

tuples in B .

Theorem 5.11 (Answering top-

queries). Let T be a set of uncertain tuples

indexed by PRist with granularity parameter h. Let b h ,

(

)

h) be

the prob-intervals that have been scanned and B contain all tuples whose top-k

probabilities lying in b h ,

b h − 1 ...,

b i ( 1

≤

b h − 1 ...,

b i , then

|≥

∈

(

)

1. Pruning :if

l and t

B, then t is not an answer to the top-

query;

2. Validating :if

|≤

l and t

∈

B, then t is an answer to the top-

(

)

query.

B , Pr k

Pr k

Proof. We only need to prove that for any tuple t x ∈

B and t y ∈

(

t x ) <

(

t y )

i − 1

B ,wehave Pr k

. Moreover, Pr k

Since t x ∈

(

t x ) >

(

t x ) ≤

because t y ∈

B . There-

fore, we have Pr k

Pr k

(

t x ) <

(

t y )

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home