Probabilistic Ranking Queries on Uncertain Data - Ranking Queries on Uncertain Data - page 108

Database Reference

In-Depth Information

U:

L: 2 (t4)

4 (t4)

(0.8,1.0]

1

t4

0.8

U:

L:

3 (t4)

2 (t4)

(0.6,0.8]

0.6

U:

L:

3 (t4)

1 (t4)

(0.4,0.6]

0.4

0.2

U:

L:

U:

L:

2 (t4)

1 (t4)

2 (t4)

0 (t4)

(0.2,0.4]

0

0

1

2

3

4

5

Rank k

(0,0.2]

(a) Top- k probabilities of tuple t 4 .

(b) Index entries.

Fig. 5.2 The index entries in PRist for tuple t 4 .

To support online query answering, we need to index the top- k probabilities and

the p -ranks of tuples so that the checking operations can be conducted efficiently.

Instead of building two different indexes for top- k probabilities and p -ranks sepa-

rately, can we kill two birds with one stone?

One critical observation is that, for a tuple, the top- k probabilities and the p -ranks

can be derived from each other. We propose PRist , a list of probability intervals, to

store the rank information for tuples.

Example 5.4 (Indexing top-k probabilities). Let us consider how to index the uncer-

tain tuples in Table 2.2.

Figure 5.2(a) shows the top-k probabilities of tuple t 4 with respect to different

values of k. Interestingly, it can also be used to retrieve the p-rank of t 4 : for a given

probability p, we can draw a horizontal line for top-k probability

p, and then

check where the horizontal line cut the curve in Figure 5.2(a). The point right below

the horizontal line gives the answer k.

Storing the top-k probabilities for all possible k can be very costly, since k is in

range

=

where m is the number of rules in the data set. To save space, we can

divide the domain of top-k probabilities

[

1

,

m

]

into h prob-intervals .

In Figure 5.2(a), we partition the probability range

(

0

,

1

]

(

0

,

1

]

to 5 prob-intervals:

(

. For each interval, we record

for each tuple a lower bound and an upper bound of the ranks whose corresponding

top-k probabilities lie in the prob-interval.

For example, the top- 1 and top- 2 probabilities of t 4 are 0

0

,

0

.

2

]

,

(

0

.

2

,

0

.

4

]

,

(

0

.

4

,

0

.

6

]

,

(

0

.

6

,

0

.

8

]

, and

(

0

.

8

,

1

.

0

]

.

.

45 , respec-

tively. Therefore, a lower bound and an upper bound of ranks of t 4 in interval

0945 and 0

]

are 0 and 2 , respectively. Specifically, we define the top- 0 probability of any tuples

as 0 . Moreover, we choose rank k

(

,

.

0

0

2

=

2 as the upper bound of the 0

.

2 -rank of t 4 , since

the top- 2 probability of t 4 is greater than 0

.

2 and all top-i probabilities of t 4 for

i

<

2 are smaller than 0

.

2 . The lower bound of the 0 -rank probability of t 4 is chosen

similarly.

Next Page

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home