Database Reference
In-Depth Information
In the U -list of the prob-interval containing the probability threshold, finding the
tuples whose upper ranks are less than or equal to k requires O
time, where n
is the number of tuples in T . Similarly, in the L -list, finding the tuples whose lower
ranks are larger than k also takes O
(
log n
)
time. Let d be the number of tuples that
cannot be pruned or validated. Computing the top- k probabilities of those tuples
requires O
(
log n
)
(
kmd
)
time, where m is the number of rules in T .
5.5.2.2 Answering Top-
(
k
,
l
)
Queries
Example 5.7 (Answering Top-
queries). Consider the uncertain tuples in Ta-
ble 2.2 which are indexed in the PRist in Figure 5.3. To answer a top-
(
k
,
l
)
(
k
,
l
)
with
k
=
2 , we scan each prob-interval in Figure 5.3(a) from top down.
In prob-interval b 5 =(
3 and l
=
2 , which means Pr 2
0
.
8
,
1
.
0
]
, we find t 4 .
L 5 =
(
t 4 )
0
.
8
and Pr 3
3 , which means Pr 4
(
t 4 ) >
0
.
8 . At the same time, t j .
L 5 =
4 for 1
j
(
t j )
3 . Since for any tuple t, we have Pr 3
Pr 4
0
.
8 for 1
j
(
t
)
(
t
)
, it is clear that
Pr 3
Pr 3
3 ). Thus, t 4 is one of the answers to the query.
Using the similar procedure, we scan prob-interval b 4 and find that t 3 is another
answer. Since the query asks for the top- 2 results, the query answering procedure
stops.
(
t 4 ) <
(
t j )
( 1
j
query Q , we want to scan the tuples in the descending order
of their top- k probabilities. However, PRist does not store any exact top- k probabil-
ities. We scan the prob-intervals in the top-down manner instead. For each prob-
interval, we retrieve the tuples whose top- k probabilities lie in the prob-interval.
Obviously, for two prob-intervals b i and b j ( i
To answer a top-
(
k
,
l
)
j ), the top- k probabilities falling in
b i is always greater than the top- k probabilities in b j .
The query answering algorithm proceeds in three steps.
Step 1: Bounding For a prob-interval b i and a tuple t , we use Corollary 5.4 to
determine if the top- k probability of t lies in b i .
Step 2: Pruning and Validating We scan the prob-intervals from top down, and
use a buffer B to store the tuples whose top- k probabilities lie in the scanned prob-
intervals. The tuples not in B have smaller top- k probabilities than those in B .A
tuple may be pruned or validated according to whether it is in B and the number of
tuples in B .
>
Theorem 5.11 (Answering top-
queries). Let T be a set of uncertain tuples
indexed by PRist with granularity parameter h. Let b h ,
(
k
,
l
)
h) be
the prob-intervals that have been scanned and B contain all tuples whose top-k
probabilities lying in b h ,
b h 1 ...,
b i ( 1
i
b h 1 ...,
b i , then
|
|≥
(
,
)
1. Pruning :if
B
l and t
B, then t is not an answer to the top-
k
l
query;
2. Validating :if
|
B
|≤
l and t
B, then t is an answer to the top-
(
k
,
l
)
query.
B , Pr k
Pr k
Proof. We only need to prove that for any tuple t x
B and t y
(
t x ) <
(
t y )
.
i 1
h
i 1
h
B ,wehave Pr k
. Moreover, Pr k
Since t x
(
t x ) >
(
t x )
because t y
B . There-
fore, we have Pr k
Pr k
(
t x ) <
(
t y )
.
Search WWH ::




Custom Search