Probabilistic Ranking Queries on Uncertain Data - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

The space cost of a PRist is O

(

2 hn

)

, where n is the number of tuples and h is the

number of prob-intervals.

To reduce the space cost of PRist , if an entry

(

rank

)

appears in the U -list of

prob-interval b i and the L -list of prob-interval b i + 1 (

, we can let the two

lists share the entry. Moreover, if multiple entries in a list have the same rank, we

can compress those entries into one which carries one rank and multiple tuple ids.

A compressed PRist index is shown in Figure 5.3(b).

One reason that compressed PRist can achieve non-trivial saving is that, for a

tuple t with membership probability Pr

≤

)

(

)

(

)

,if Pr

falls in prob-interval b i , the up-

per and lower ranks of t in prob-intervals b i + 1

,...,

b h are identical and thus can be

shared.

In addition, PRist can be tailored according to different application requirements.

For example, in some scenarios, users may be only interested in the probabilistic

ranking queries with a small rank parameter k and a large probability threshold p .In

such a case, given a maximum rank parameter value k max and a minimum probability

threshold value p min in users' queries, only the prob-intervals

i − 1

p min

need to be stored. For each prob-interval, the U -list and L -list only contain the tuples

whose upper ranks are at most k max .

We can build a PRist index in two steps. In the first step, we compute the top- k

probabilities of all tuples using the methods in Section 5.2. The time complexity of

computing the top- i probabilities of all tuples for all 1

(

h ]

with

h >

m 2 n

≤

m is O

(

)

, where m

is the number of rules in T and n is the number of tuples in T .

In the second step, we construct a set of prob-intervals and compute the U -lists

and L -lists. For each tuple t , we scan its top- i (1

≤

m ) probabilities and fill up

the upper rank t

U j and lower rank t

L j for each prob-interval b j (1

≤

h ). Since

. We sort the entries in the

U -lists and the L -lists. Since there are 2 h lists, each of n entries, the total time cost

of sorting is O

there are n tuples and m ranks, the time cost is O

(

)

The overall time complexity of the basic construction algorithm is O

(

2 hn log n

)

m 2 n

(

m 2 n

2 hn log n

(

)

2 hn log n

5.5.2 Query Evaluation based on PRist

The query evaluation based on the PRist index follows three steps.

1. Bounding : for each tuple in question, we derive an upper bound and a lower

bound of the measure of interest;

2. Pruning and validating : The upper and lower bound derived in the bounding

phase are used to prune or validate tuples.

3. Evaluating : for those tuples that cannot be pruned or validated, the exact top-

k probabilities are calculated. Then, the tuples are evaluated with respect to the

query.

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home