Database Reference
In-Depth Information
whose top- k probability values fail the threshold. Any tuples in R that have not been
seen should be tested against this largest membership probability.
Our last pruning rule is based on the observation that the sum of the top- k prob-
ability values of all tuples is exactly k . That is
T Pr k
(
t
)=
k .
t
Theorem 5.6 (Pruning by total top- k probability). Let A be a set of tuples whose
top-k probability values have been computed. If
t A Pr k
(
t
) >
k
p, then for every
tuple t
A, Pr k
t ) <
(
p.
Moreover, we have a tight stopping condition as follows.
Theorem 5.7 (A tight stopping condition). Let t 1
,...,
t m
,...,
t n be the tuples in the
ranking order. Assume L
=
t 1 ,...,
t m are read. Let LR be the set of open rules with
respect to t m + 1 . For any tuple t i (
i
>
m
)
,
k
1
j = 0 Pr ( L , j ) ;
1. if t i is not in any rule in LR, the top-k probability of t i Pr k
(
t i
)
2. if
t i
is
in
a
rule
in
LR,
the
top-k
probability
of
t i
k 1
j = 0 Pr ( L t R left , j ) .
Pr k
(
t i )
max
R
LR (
1
Pr
(
t R left ))
Proof. For item (1), consider the compressed dominant set T
(
t i )
of t i . L
T
(
t i )
.
Therefore,
1
j = 0 Pr ( L , j ) .
The equality holds if tuple t m + 1 is independent with membership probability 1.
For item (2), suppose t i is involved in an open rule R
k
1
j = 0 Pr ( T ( t i ) , j )
k
Pr k
(
t i )=
Pr
(
t i )
LR . Pr
(
t i )
1
Pr
(
t R left )
.
Moreover, for the compressed dominant set T
(
t i )
of t i ,
(
L
t R left )
T
(
t i )
. Therefore,
k
1
j = 0 Pr ( T ( t i ) , j ) ( 1 Pr ( t R left ))
k
1
j = 0 Pr ( L t R left , j )
Pr k
(
t i )=
Pr
(
t i )
The equality holds when tuple t m + 1 is involved in rule R with membership prob-
ability 1
Pr
(
t R left )
, where
k
1
j = 0 Pr ( L t R left , j ) .
R =
R LR (
(
t R left ))
arg max
1
Pr
Theorem 5.7 provides two upper bounds for tuples that have not been seen
yet. If the upper bounds are both lower than the probability threshold p , then the
unseen tuples do not need to be checked. The two bounds are both tight: Con-
clusion 1 can be achieved if Pr
(
t 1
)=
1, while Conclusion 2 can be achieved if
(
t R left )
(
)=
{
(
t R left ) }
t i
arg min R LR Pr
and Pr
t i
max R LR
1
Pr
.
 
Search WWH ::




Custom Search