Probabilistic Ranking Queries on Uncertain Data - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

Step 3: Evaluating If, before prob-interval b i is scanned, B has l tuples such that

l <

l , but after b i is scanned, B has l or more tuples, we enter the evaluating step.

If B has exactly l tuples, we do not need to do anything. However, if B has more

than l tuples, we need to evaluate the exact top- k probabilities of those tuples added

to B from b i , and include into B only the top-

l )

(

−

tuples of the largest top- k

probabilities.

When a tuple can be added into the answer set without the evaluating step, the

time to retrieve the tuple is constant. There are at most l such tuples. Only the top- k

probabilities of the tuples in the last prob-interval need to be evaluated. Let d be

the number of such tuples. Then, the time complexity of evaluating those tuples is

(

kmd

)

, as computing the top- k probabilities of those tuples requires O

(

kmd

)

time.

The overall time complexity of query answering is O

(

kmd

)

5.5.2.3 Answering Top-

(

)

Queries

Example 5.8 (Answering top-

queries). Consider the uncertain tuples in Fig-

ure 5.3 again. How can we answer the top-

(

)

(

)

query with p

5 and l

2 ?

We only need to check prob-interval b 3 =(

]

that contains p. First, the

top- 2 highest ranks in b 3 .

U 3 are 3 for tuples t 3 and t 4 , which means MR 0 . 5 (

t 3 ) ≤

and MR 0 . 5 (

t 4 ) ≤

3 . Therefore, we can set 3 as the upper bound of the 0

5 -ranks, and

any tuple t with MR 0 . 5 >

3 cannot be the answer to the query.

Second, we scan list b 3 .

L 3 from the beginning. t 1 is added into a buffer B, since

t 1 .

3 and MR 0 . 5 might be smaller than or equal to 3 .t 3 and t 4 are added into

B due to the same reason. Then, when scanning t 2 , we find t 2 .

L 3 <

L 3 =

4 , which means

MR 0 . 5 (

4 . Thus, t 2 and any tuples following t 2 in the list (in this

example, no other tuples) can be pruned.

Last, the 0

t 2 ) ≥

MR 0 . 4 (

t 2 ) ≥

5 -ranks of the tuples in B (i.e., t 1 ,t 3 , and t 4 ) are calculated, and the

tuples with the top- 2 highest 0

5 -ranks are returned as the answer to the query,

which are t 1 and t 3 .

query, the three steps work as follows.

Step 1: Bounding The p -rank of a tuple can be bounded using the following rule.

To answer a top-

(

)

Theorem 5.12 (Bounding p -ranks). Let T be a set of uncertain tuples indexed by

PRist with granularity parameter h. For a tuple t

∈

S and p

∈ (

]

, let b i ( 1

≤

−

h <

h . Then, t

be the prob-interval such that p

∈

b i , i.e.,

≤

L i ≤

MR p (

) ≤

U i .

Proof. To prove Theorem 5.12, we need the following two lemmas.

Lemma 5.2 (Monotonicity). Let t be an uncertain tuple, and p 1 and p 2 be two real

values in

(

]

.Ifp 1 ≤

p 2 , then MR p 1 (

) ≤

MR p 2 (

)

k 1 . Then, Pr k 1

k 1 , Pr k

Proof. Let MR p 1 (

(

) ≥

p 1 . Moreover, for any k

(

) <

p 1 .

k 1 , Pr k

Since p 1 ≤

p 2 , for any k

(

) <

p 2 . Therefore, MR p 2 (

) ≥

k 1 =

MR p 1 (

)

Lemma 5.3. Let t be an uncertain tuple, k be a positive integer, and p be a real

values in

.IfPr k

(

]

(

) ≤

p, then MR p (

) ≥

Proof. Since Pr k

k , Pr x

(

) ≤

p , for any x

≤

(

) ≤

p . Thus, MR p (

) ≥

k .

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home