Database Reference
In-Depth Information
Algorithm 5.3
The exact algorithm with reordering and pruning techniques
Input:
an uncertain table
T
, a set of generation rules
, a top-
k
query
Q
k
f
, and a probability
R
threshold
p
Output:
Answer
(
Q
,
p
,
T
)
Method:
1: retrieve tuples in
T
in the ranking order one by one
2:
for all
t
i
∈
T
do
3:
compute
T
(
t
i
)
by rule-tuple compression and reordering
compute subset probability values and
Pr
k
(
t
i
)
4:
if
Pr
k
(
t
i
)
≥
p
then
6: output
t
i
7:
end if
8: check whether
t
i
can be used to prune future tuples
9:
if
all remaining tuples in
T
fail the probability threshold
then
10: exit
11:
end if
12:
end for
5:
In summary, the exact algorithm for PT-
k
query answering is shown in Algo-
rithm 5.3. We analyze the complexity of the algorithm as follows.
For a multi-tuple rule
R
:
t
r
1
,···,
t
r
m
where
t
r
1
,...,
t
r
m
are in the ranking order, let
span
is processed, we need to remove rule-
tuple
t
r
1
,...,
r
l
−
1
, and compute the subset probability values of the updated compressed
dominant sets. When the next tuple not involved in
R
is processed,
t
r
1
,...,
r
l
−
1
and
t
r
l
are combined. Thus, in the worst case, each multi-tuple rule causes the computation
of
O
(
R
)=
r
m
−
r
1
. When tuple
t
r
l
(
1
<
l
≤
m
)
subset probability values. Moreover, in the worst case where each
tuple
T
passes the probability threshold, all tuples in
T
have to be read at least once.
The time complexity of the whole algorithm is
O
(
2
k
·
span
(
R
))
(
))
.
As indicated by our experimental results, in practice the four pruning rules are ef-
fective. Often, only a very small portion of the tuples in
T
are retrieved and checked
before the exact answer to a PT-
k
query is obtained.
Interestingly, since PT-
k
query answering methods can be extended to evaluate
top-
(
kn
+
k
∑
R
∈R
span
R
queries (as discussed in Section 5.2.1), the pruning
techniques introduced in this section can be applied to answering top-
(
k
,
l
)
queries and top-
(
p
,
l
)
(
k
,
l
)
queries
and top-
(
p
,
l
)
queries as well.
5.3 A Sampling Method
One may trade off the accuracy of answers against the efficiency. In this section, we
present a simple yet effective sampling method for estimating top-
k
probabilities of
tuples.
For a tuple
t
, let
X
t
be a random variable as an indicator to the event that
t
is ranked top-
k
in possible worlds.
X
t
=
1if
t
is ranked in the top-
k
list, and
X
t
=
0 otherwise. Apparently, the top-
k
probability of
t
is the expectation of
X
t
,