Database Reference
In-Depth Information
In the
U
-list of the prob-interval containing the probability threshold, finding the
tuples whose upper ranks are less than or equal to
k
requires
O
time, where
n
is the number of tuples in
T
. Similarly, in the
L
-list, finding the tuples whose lower
ranks are larger than
k
also takes
O
(
log
n
)
time. Let
d
be the number of tuples that
cannot be pruned or validated. Computing the top-
k
probabilities of those tuples
requires
O
(
log
n
)
(
kmd
)
time, where
m
is the number of rules in
T
.
5.5.2.2 Answering Top-
(
k
,
l
)
Queries
Example 5.7 (Answering Top-
queries). Consider the uncertain tuples in Ta-
ble 2.2 which are indexed in the
PRist
in Figure 5.3. To answer a top-
(
k
,
l
)
(
k
,
l
)
with
k
=
2
, we scan each prob-interval in Figure 5.3(a) from top down.
In prob-interval b
5
=(
3
and l
=
2
, which means Pr
2
0
.
8
,
1
.
0
]
, we find t
4
.
L
5
=
(
t
4
)
≤
0
.
8
and Pr
3
3
, which means Pr
4
(
t
4
)
>
0
.
8
. At the same time, t
j
.
L
5
=
4
for
1
≤
j
≤
(
t
j
)
≤
3
. Since for any tuple t, we have Pr
3
Pr
4
0
.
8
for
1
≤
j
≤
(
t
)
≤
(
t
)
, it is clear that
Pr
3
Pr
3
3
). Thus, t
4
is one of the answers to the query.
Using the similar procedure, we scan prob-interval b
4
and find that t
3
is another
answer. Since the query asks for the top-
2
results, the query answering procedure
stops.
(
t
4
)
<
(
t
j
)
(
1
≤
j
≤
query
Q
, we want to scan the tuples in the descending order
of their top-
k
probabilities. However,
PRist
does not store any exact top-
k
probabil-
ities. We scan the prob-intervals in the top-down manner instead. For each prob-
interval, we retrieve the tuples whose top-
k
probabilities lie in the prob-interval.
Obviously, for two prob-intervals
b
i
and
b
j
(
i
To answer a top-
(
k
,
l
)
j
), the top-
k
probabilities falling in
b
i
is always greater than the top-
k
probabilities in
b
j
.
The query answering algorithm proceeds in three steps.
Step 1: Bounding
For a prob-interval
b
i
and a tuple
t
, we use Corollary 5.4 to
determine if the top-
k
probability of
t
lies in
b
i
.
Step 2: Pruning and Validating
We scan the prob-intervals from top down, and
use a buffer
B
to store the tuples whose top-
k
probabilities lie in the scanned prob-
intervals. The tuples not in
B
have smaller top-
k
probabilities than those in
B
.A
tuple may be pruned or validated according to whether it is in
B
and the number of
tuples in
B
.
>
Theorem 5.11 (Answering top-
queries).
Let T be a set of uncertain tuples
indexed by
PRist
with granularity parameter h. Let b
h
,
(
k
,
l
)
h) be
the prob-intervals that have been scanned and B contain all tuples whose top-k
probabilities lying in b
h
,
b
h
−
1
...,
b
i
(
1
≤
i
≤
b
h
−
1
...,
b
i
, then
|
|≥
∈
(
,
)
1.
Pruning
:if
B
l and t
B, then t is not an answer to the top-
k
l
query;
2.
Validating
:if
|
B
|≤
l and t
∈
B, then t is an answer to the top-
(
k
,
l
)
query.
B
,
Pr
k
Pr
k
Proof.
We only need to prove that for any tuple
t
x
∈
B
and
t
y
∈
(
t
x
)
<
(
t
y
)
.
i
−
1
h
i
−
1
h
B
,wehave
Pr
k
. Moreover,
Pr
k
Since
t
x
∈
(
t
x
)
>
(
t
x
)
≤
because
t
y
∈
B
. There-
fore, we have
Pr
k
Pr
k
(
t
x
)
<
(
t
y
)
.