Database Reference
In-Depth Information
5.5.3 PRist+ and a Fast Construction Algorithm
We can reduce the construction time of PRist by bounding top- k probabilities using
the binomial distribution.
Consider a tuple t
. If there is any tuple
or generation rule-tuple with probability 1, we can remove the tuple from T
T and its compressed dominant set T
(
t
)
(
t
)
, and
(
)
compute the top-
probability of t . Thus, we can assume that the membership
probability of any tuple or rule-tuple in T
k
1
(
)
t
is smaller than 1.
Theorem 5.14 (Bounding the probability). For a tuple t
T , let T
(
t
)
be the com-
pressed dominant set of t. Then,
p max )
0
F
(
k ; N
,
Pr
(
T
(
t
) ,
j
)
F
(
k ; N
,
p min )
(5.3)
j
k
where p max and p min are the greatest and the smallest probabilities of the tuples/rule-
tuples in T
(
t
)(
0
<
p min
p max <
1
)
, N is the number of tuples/rule-tuples in T
(
t
)
,
and F is the cumulative distribution function of the binomial distribution.
Proof. We first prove the left side of Inequality 5.3. For a tuple set S , let Pr
(
S
,≤
k
)
. For any tuple t
t )
denote
k Pr
(
S
,
j
)
T
(
t
)
, Pr
(
p max .
0
j
t }
and T (
Consider tuple set S
=
T
(
t
) −{
t
)=
S
+
t max where Pr
(
t max )=
p max .
From Theorem 5.2,
t )
t ))
Pr
(
T
(
t
) ,≤
k
)=
Pr
(
Pr
(
S
,≤
k
1
)+(
1
Pr
(
Pr
(
S
,≤
k
)
;
and
T (
(
) ,≤
)=
(
)
(
,≤
)+(
(
))
(
,≤
) .
Pr
t
k
Pr
t max
Pr
S
k
1
1
Pr
t max
Pr
S
k
Then,
T (
t )
Pr
(
T
(
t
) ,≤
k
)
Pr
(
t
) ,≤
k
)=[
Pr
(
Pr
(
t max )] × [
Pr
(
S
,≤
k
1
)
Pr
(
S
,≤
k
)] .
t )
Since Pr
(
Pr
(
t max )
and Pr
(
S
,≤
k
1
)
Pr
(
S
,≤
k
)
,wehave Pr
(
T
(
t
) ,≤
k
)
T (
Pr
(
.
By replacing each tuple/rule-tuple in T
t
) ,≤
k
)
with t max , we obtain a set of tuples
with the same probability p max , whose subset probabilities follows the binomial
distribution F
(
t
)
. Thus, the left side of Inequality 5.3 is proved.
The right side of Inequality 5.3 can be proved similarly.
(
k ; N
,
p max )
Moreover, Hoeffding [183] gave the following bound.
Theorem 5.15 (Extrema [183]). For a tuple t
T and its compressed dominant set
t f t
T
(
t
)
, let
μ = ∑
Pr
(
)
. Then,
t
T
(
t
)
, N )
k
j
1.
0 Pr
(
T
(
t
) ,
j
)
F
(
k ; N
when 0
k
μ
1 ; and
=
k j
, N )
2.
0 Pr
(
T
(
t
) ,
j
)
F
(
k ; N
when
μ
k
N,
=
 
Search WWH ::




Custom Search