Database Reference
In-Depth Information
The space cost of a PRist is O
(
2 hn
)
, where n is the number of tuples and h is the
number of prob-intervals.
To reduce the space cost of PRist , if an entry
(
o
,
rank
)
appears in the U -list of
prob-interval b i and the L -list of prob-interval b i + 1 (
, we can let the two
lists share the entry. Moreover, if multiple entries in a list have the same rank, we
can compress those entries into one which carries one rank and multiple tuple ids.
A compressed PRist index is shown in Figure 5.3(b).
One reason that compressed PRist can achieve non-trivial saving is that, for a
tuple t with membership probability Pr
1
i
<
h
)
(
)
(
)
t
,if Pr
t
falls in prob-interval b i , the up-
per and lower ranks of t in prob-intervals b i + 1
,...,
b h are identical and thus can be
shared.
In addition, PRist can be tailored according to different application requirements.
For example, in some scenarios, users may be only interested in the probabilistic
ranking queries with a small rank parameter k and a large probability threshold p .In
such a case, given a maximum rank parameter value k max and a minimum probability
threshold value p min in users' queries, only the prob-intervals
i 1
h
i
i
p min
need to be stored. For each prob-interval, the U -list and L -list only contain the tuples
whose upper ranks are at most k max .
We can build a PRist index in two steps. In the first step, we compute the top- k
probabilities of all tuples using the methods in Section 5.2. The time complexity of
computing the top- i probabilities of all tuples for all 1
(
,
h ]
with
h >
m 2 n
i
m is O
(
)
, where m
is the number of rules in T and n is the number of tuples in T .
In the second step, we construct a set of prob-intervals and compute the U -lists
and L -lists. For each tuple t , we scan its top- i (1
i
m ) probabilities and fill up
the upper rank t
.
U j and lower rank t
.
L j for each prob-interval b j (1
j
h ). Since
. We sort the entries in the
U -lists and the L -lists. Since there are 2 h lists, each of n entries, the total time cost
of sorting is O
there are n tuples and m ranks, the time cost is O
(
mn
)
.
The overall time complexity of the basic construction algorithm is O
(
2 hn log n
)
m 2 n
(
+
+
mn
m 2 n
2 hn log n
)=
O
(
+
)
2 hn log n
.
5.5.2 Query Evaluation based on PRist
The query evaluation based on the PRist index follows three steps.
1. Bounding : for each tuple in question, we derive an upper bound and a lower
bound of the measure of interest;
2. Pruning and validating : The upper and lower bound derived in the bounding
phase are used to prune or validate tuples.
3. Evaluating : for those tuples that cannot be pruned or validated, the exact top-
k probabilities are calculated. Then, the tuples are evaluated with respect to the
query.
 
Search WWH ::




Custom Search