Database Reference
In-Depth Information
where N is the number of tuples and rule-tuples in T
(
t
)
, and F is the cumulative
distribution function of the binomial distribution.
Based on Theorems 5.14 and 5.15, we derive the following bound for the top- k
probability of tuple t
T .
Theorem 5.16 (Bounds of top- k probabilities). For a tuple t
T , the top-k proba-
bility of t satisfies
N )
Pr k
1. Pr
(
t
)
F
(
k
1; N
,
p max )
(
t
)
Pr
(
t
)
F
(
k
1; N
,
for 1
k
μ
;
, N )
Pr k
2. Pr
(
t
)
F
(
k
1; N
(
t
)
Pr
(
t
)
F
(
k
1; N
,
p min )
for
μ +
1
k
N
+
1 .
Proof. The theorem holds following from Theorems 5.14 and 5.15.
Example 5.9 (Bounding the probability). Consider the uncertain tuples in Table 2.2
again. T
(
t 3 )= {
t 1 ,
t 2 }
. The expected number of tuples in T
(
t 3 )
is
μ =
Pr
(
t 1 )+
Pr
(
is 2 .
Consider the top- 2 probability of t 3 . Since 2
t 2 )=
0
.
8 . The number of rules in T
(
t 3 )
,Pr 2
> μ
(
t 3 )
Pr
(
t 3 )
F
(
1; 2
,
0
.
4
)=
3 ,Pr 2
0
.
7
×
0
.
84
=
0
.
588 . Since min
{
Pr
(
t 1 ) ,
Pr
(
t 2 ) } =
0
.
(
t 3 )
Pr
(
t 3 )
F
(
1; 2
,
0
.
3
)=
637 . Therefore, Pr 2
0
.
7
×
0
.
91
=
0
.
(
t 3 )
is bounded in range
[
0
.
588
,
0
.
637
]
.
Since the cumulative probability distribution of the binomial distribution is easier
to calculate than top- k probabilities, we propose PRist+ , a variant of PRist using the
binomial distribution bounding technique.
The only difference between PRist+ and PRist is the upper and lower ranks in
the U -lists and L -lists. In PRist+ , we compute the upper and lower bounds of the
top- k probabilities of tuples using the binomial distributions. Then, the upper and
lower ranks are derived from the upper and lower bounds of the top- k probabilities.
Take the U -list in prob-interval b i as an example. In PRist , an entry in the U -
list consists of the tuple id t and the upper rank t
U i , such that Pr t . U i
i
h
.
(
t
) >
(if
Pr m
i
h ). Once the top- i probabilities of all ranks 1
(
t
) >
i
m for t are computed,
t
.
U i can be obtained by one scan.
In PRist+ , we store the upper rank t
.
U i as the smallest rank x such that the lower
bound of Pr x
i
h . Following with Theorem 5.16, the upper rank can
(
)
t
is greater than
, N )+
F 1
i
F 1
i
be calculated by x
1, where
F 1 is the binomial inverse cumulative distribution function. The lower rank of t
can be obtained similarly using Theorem 5.16.
Computing the upper and lower ranks for t in b i requires O
=
(
) ,
N
1or x
=
(
) ,
N
,
p max )+
h
·
Pr
(
t
h
·
Pr
(
t
time. Thus, the
overall complexity of computing the upper and lower ranks of all tuples in all prob-
intervals is O
(
1
)
(
2 hn
)
, where n is the total number of tuples. The complexity of sorting
the bound lists is O
(
2 hn log n
)
. The overall time complexity of constructing a PRist+
(
+
)=
(
)
index is O
.
Clearly, the construction time of PRist+ is much lower than PRist . The tradeoff
is that the rank bounds in PRist+ is looser than PRist . As will be shown in the next
section, all query answering methods on PRist can be applied on PRist+ . The looser
rank bounds in PRist+ does not affect the accuracy of the answers. They only make
a very minor difference in query answering time in our experiments.
2 hn
2 hn log n
O
hn log n
 
Search WWH ::




Custom Search