Database Reference
In-Depth Information
where all tuples are independent. Then, we can compute
Pr
k
T
(
t
i
)
(
t
i
)
on
T
(
t
i
)
∪{
t
i
}
using Theorem 5.2.
5.2 Exact Query Answering Methods
In this section, we first discuss how to answer probabilistic ranking queries based
on the rank-
k
probability computation method in Section 5.1. Then, we propose two
techniques to speed up the query answering methods.
5.2.1 Query Answering Framework
Straightforwardly, to answer a PT-
k
query with probability threshold
p
, we simply
scan all tuples in
T
in the ranking order and compute the top-
k
probability of each
tuple. The tuples whose top-
k
probabilities passing the threshold
p
are returned as
the answers.
It is shown in Corollary 2.3 that, for a PT-
k
query and an RT-
k
query with the
same rank parameter
k
and probability threshold
p
values, the answers are identical.
Therefore, the above method can be directly used to answer an RT-
k
query.
The method can be extended to answer top-
queries as following. Again,
we scan the tuples in the ranking order. A buffer
B
that contains at most
l
tuples is
maintained during the scan. At the beginning, the first
l
tuples
t
1
,···,
(
k
,
l
)
t
l
are added
into the buffer. The probability threshold
p
is set to the minimal probability value
of
t
1
,···,
Pr
k
t
l
. That is,
p
=
min
1
≤
i
≤
l
{
(
t
i
)
}
. Then, for each tuple
t
i
(
l
+
1
≤
i
≤|
T
|
),
if
Pr
k
p
, then the tuple in
B
with the minimal top-
k
probability is replaced
by
t
i
. The probability threshold
p
is again set to the minimal top-
k
probability
min
t
∈
B
(
t
i
)
≥
Pr
k
{
(
)
}
of the tuples in the buffer. At the end of the scan, all tuples in the
buffer are returned.
To answer a top-
t
query, we can use the similar procedure. The only differ-
ence is that a buffer
B
that contains the tuples with the smallest
p
-ranks during the
scan is maintained. The rank threshold
k
is set to the largest
p
-rank of all tuples in
the buffer. That is,
k
(
p
,
l
)
.
From the above discussion, it is clear that all four types of queries discussed in
this work can be answered based on top-
k
probability calculation. However, can we
further improve the efficiency of the query answering methods? In Section 5.2.2,
we discuss how to reuse subset probability calculation during computing the top-
k
probability values for all tuples. In Section 5.2.3, we develop several effective
pruning techniques to improve the efficiency.
=
max
t
∈
B
{
MR
p
(
t
)
}