Database Reference
In-Depth Information
whose top-
k
probability values fail the threshold. Any tuples in
R
that have not been
seen should be tested against this largest membership probability.
Our last pruning rule is based on the observation that the sum of the top-
k
prob-
ability values of all tuples is exactly
k
. That is
T
Pr
k
∑
(
t
)=
k
.
t
∈
Theorem 5.6 (Pruning by total top-
k
probability).
Let A be a set of tuples whose
top-k probability values have been computed. If
∑
t
∈
A
Pr
k
(
t
)
>
k
−
p, then for every
tuple t
∈
A, Pr
k
t
)
<
(
p.
Moreover, we have a tight stopping condition as follows.
Theorem 5.7 (A tight stopping condition).
Let t
1
,...,
t
m
,...,
t
n
be the tuples in the
ranking order. Assume L
=
t
1
,...,
t
m
are read. Let LR be the set of open rules with
respect to t
m
+
1
. For any tuple t
i
(
i
>
m
)
,
k
1
j
=
0
Pr
(
L
,
j
)
;
−
1. if t
i
is not in any rule in LR, the top-k probability of t
i
Pr
k
(
t
i
)
≤
2. if
t
i
is
in
a
rule
in
LR,
the
top-k
probability
of
t
i
k
−
1
j
=
0
Pr
(
L
−
t
R
left
,
j
)
.
Pr
k
(
t
i
)
≤
max
R
LR
(
1
−
Pr
(
t
R
left
))
∈
Proof.
For item (1), consider the compressed dominant set
T
(
t
i
)
of
t
i
.
L
⊆
T
(
t
i
)
.
Therefore,
1
j
=
0
Pr
(
L
,
j
)
.
The equality holds if tuple
t
m
+
1
is independent with membership probability 1.
For item (2), suppose
t
i
is involved in an open rule
R
k
1
j
=
0
Pr
(
T
(
t
i
)
,
j
)
≤
−
k
−
Pr
k
(
t
i
)=
Pr
(
t
i
)
∈
LR
.
Pr
(
t
i
)
≤
1
−
Pr
(
t
R
left
)
.
Moreover, for the compressed dominant set
T
(
t
i
)
of
t
i
,
(
L
−
t
R
left
)
⊆
T
(
t
i
)
. Therefore,
k
1
j
=
0
Pr
(
T
(
t
i
)
,
j
)
≤
(
1
−
Pr
(
t
R
left
))
−
k
1
j
=
0
Pr
(
L
−
t
R
left
,
j
)
−
Pr
k
(
t
i
)=
Pr
(
t
i
)
The equality holds when tuple
t
m
+
1
is involved in rule
R
with membership prob-
ability 1
−
Pr
(
t
R
left
)
, where
k
1
j
=
0
Pr
(
L
−
t
R
left
,
j
)
.
−
R
=
R
∈
LR
(
−
(
t
R
left
))
arg max
1
Pr
Theorem 5.7 provides two upper bounds for tuples that have not been seen
yet. If the upper bounds are both lower than the probability threshold
p
, then the
unseen tuples do not need to be checked. The two bounds are both tight: Con-
clusion 1 can be achieved if
Pr
(
t
1
)=
1, while Conclusion 2 can be achieved if
∈
(
t
R
left
)
(
)=
{
−
(
t
R
left
)
}
t
i
arg min
R
∈
LR
Pr
and
Pr
t
i
max
R
∈
LR
1
Pr
.