Database Reference
In-Depth Information
5.5.3
PRist+
and a Fast Construction Algorithm
We can reduce the construction time of
PRist
by bounding top-
k
probabilities using
the binomial distribution.
Consider a tuple
t
. If there is any tuple
or generation rule-tuple with probability 1, we can remove the tuple from
T
∈
T
and its compressed dominant set
T
(
t
)
(
t
)
, and
(
−
)
compute the top-
probability of
t
. Thus, we can assume that the membership
probability of any tuple or rule-tuple in
T
k
1
(
)
t
is smaller than 1.
Theorem 5.14 (Bounding the probability).
For a tuple t
∈
T , let T
(
t
)
be the com-
pressed dominant set of t. Then,
p
max
)
≤
∑
0
F
(
k
;
N
,
Pr
(
T
(
t
)
,
j
)
≤
F
(
k
;
N
,
p
min
)
(5.3)
≤
j
≤
k
where p
max
and p
min
are the greatest and the smallest probabilities of the tuples/rule-
tuples in T
(
t
)(
0
<
p
min
≤
p
max
<
1
)
, N is the number of tuples/rule-tuples in T
(
t
)
,
and F is the cumulative distribution function of the binomial distribution.
Proof.
We first prove the left side of Inequality 5.3. For a tuple set
S
, let
Pr
(
S
,≤
k
)
. For any tuple
t
∈
t
)
≤
denote
k
Pr
(
S
,
j
)
T
(
t
)
,
Pr
(
p
max
.
∑
0
≤
j
≤
t
}
and
T
(
Consider tuple set
S
=
T
(
t
)
−{
t
)=
S
+
t
max
where
Pr
(
t
max
)=
p
max
.
From Theorem 5.2,
t
)
t
))
Pr
(
T
(
t
)
,≤
k
)=
Pr
(
Pr
(
S
,≤
k
−
1
)+(
1
−
Pr
(
Pr
(
S
,≤
k
)
;
and
T
(
(
)
,≤
)=
(
)
(
,≤
−
)+(
−
(
))
(
,≤
)
.
Pr
t
k
Pr
t
max
Pr
S
k
1
1
Pr
t
max
Pr
S
k
Then,
T
(
t
)
−
Pr
(
T
(
t
)
,≤
k
)
−
Pr
(
t
)
,≤
k
)=[
Pr
(
Pr
(
t
max
)]
×
[
Pr
(
S
,≤
k
−
1
)
−
Pr
(
S
,≤
k
)]
.
t
)
≤
Since
Pr
(
Pr
(
t
max
)
and
Pr
(
S
,≤
k
−
1
)
≤
Pr
(
S
,≤
k
)
,wehave
Pr
(
T
(
t
)
,≤
k
)
≥
T
(
Pr
(
.
By replacing each tuple/rule-tuple in
T
t
)
,≤
k
)
with
t
max
, we obtain a set of tuples
with the same probability
p
max
, whose subset probabilities follows the binomial
distribution
F
(
t
)
. Thus, the left side of Inequality 5.3 is proved.
The right side of Inequality 5.3 can be proved similarly.
(
k
;
N
,
p
max
)
Moreover, Hoeffding [183] gave the following bound.
Theorem 5.15 (Extrema [183]).
For a tuple t
∈
T and its compressed dominant set
t
≺
f
t
T
(
t
)
, let
μ = ∑
Pr
(
)
. Then,
t
∈
T
(
t
)
,
N
)
k
j
1.
∑
0
Pr
(
T
(
t
)
,
j
)
≤
F
(
k
;
N
when
0
≤
k
≤
μ
−
1
; and
=
k
j
,
N
)
2.
0
Pr
(
T
(
t
)
,
j
)
≥
F
(
k
;
N
when
μ
≤
k
≤
N,
∑
=