Database Reference
In-Depth Information
where all tuples are independent. Then, we can compute Pr k
T
(
t i )
(
t i )
on T
(
t i ) ∪{
t i }
using Theorem 5.2.
5.2 Exact Query Answering Methods
In this section, we first discuss how to answer probabilistic ranking queries based
on the rank- k probability computation method in Section 5.1. Then, we propose two
techniques to speed up the query answering methods.
5.2.1 Query Answering Framework
Straightforwardly, to answer a PT- k query with probability threshold p , we simply
scan all tuples in T in the ranking order and compute the top- k probability of each
tuple. The tuples whose top- k probabilities passing the threshold p are returned as
the answers.
It is shown in Corollary 2.3 that, for a PT- k query and an RT- k query with the
same rank parameter k and probability threshold p values, the answers are identical.
Therefore, the above method can be directly used to answer an RT- k query.
The method can be extended to answer top-
queries as following. Again,
we scan the tuples in the ranking order. A buffer B that contains at most l tuples is
maintained during the scan. At the beginning, the first l tuples t 1 ,···,
(
k
,
l
)
t l are added
into the buffer. The probability threshold p is set to the minimal probability value
of t 1 ,···,
Pr k
t l . That is, p
=
min 1 i l {
(
t i ) }
. Then, for each tuple t i ( l
+
1
i
≤|
T
|
),
if Pr k
p , then the tuple in B with the minimal top- k probability is replaced
by t i . The probability threshold p is again set to the minimal top- k probability
min t B
(
t i )
Pr k
{
(
) }
of the tuples in the buffer. At the end of the scan, all tuples in the
buffer are returned.
To answer a top-
t
query, we can use the similar procedure. The only differ-
ence is that a buffer B that contains the tuples with the smallest p -ranks during the
scan is maintained. The rank threshold k is set to the largest p -rank of all tuples in
the buffer. That is, k
(
p
,
l
)
.
From the above discussion, it is clear that all four types of queries discussed in
this work can be answered based on top- k probability calculation. However, can we
further improve the efficiency of the query answering methods? In Section 5.2.2,
we discuss how to reuse subset probability calculation during computing the top-
k probability values for all tuples. In Section 5.2.3, we develop several effective
pruning techniques to improve the efficiency.
=
max t B {
MR p (
t
) }
Search WWH ::




Custom Search