Database Reference
In-Depth Information
5.4 A Poisson Approximation Based Method
In this section, we further analyze the properties of top- k probability from the statis-
tics aspect, and derive a general stopping condition for query answering algorithms
which depends on parameter k and threshold p only and is independent from data
set size. We also devise an approximation method based on the Poisson approxima-
tion. Since the PT- k query answering methods can be extended to evaluate top-
(
,
)
k
l
(
,
)
queries and top-
queries as discussed in Section 5.2.1, the Poisson approxima-
tion based method can be used to answer top-
p
l
(
k
,
l
)
queries and top-
(
p
,
l
)
queries
too. We omit the details to avoid redundance.
5.4.1 Distribution of Top-k Probabilities
Let X 1 ,...,
X n be a set of independent random variables, such that Pr
(
X i =
1
)=
p i
i
i
and Pr
(
X i =
0
)=
1
p i (
1
i
n
)
. Let X
=
1 X i . Then, E
[
X
]=
1 p i .Ifall
=
=
X n are called Bernoulli trials , and X follows a binomial
distribution ; otherwise, X 1 ,...,
p i 's are identical, X 1 ,...,
X n are called Poisson trials , and X follows a Poisson
binomial distribution .
For a tuple t
k
j
T , the top- k probability of t is Pr k
(
t
)=
Pr
(
t
)
1 Pr
(
T
(
t
) ,
j
1
) ,
=
where Pr
is the compressed domi-
nant set of t . Moreover, the probability that fewer than k tuples appear in T
(
t
)
is the membership probability of t , T
(
t
)
(
t
)
is
k
j
.
If there is any tuple or generation rule-tuple in T
1 Pr
(
T
(
t
) ,
j
1
)
=
(
t
)
with probability 1, we can
remove the tuple from T
probability of t . Thus, we
can assume that the membership probability of any tuple or rule-tuple in T
(
t
)
, and compute the top-
(
k
1
)
(
t
)
is
smaller than 1.
To compute Pr k
(
)
(
)
t
, we construct a set of Poisson trials corresponding to T
t
as
follows. For each independent tuple t
(
)
T
t
, we construct a random trial X t whose
t )
success probability Pr
(
X t =
1
)=
Pr
(
. For each multi-tuple generation rule R
( R
T
(
t
) =
0), we combine the tuples in R
T
(
t
)
into a rule-tuple t R such
t )
that Pr
(
t R )= t R T ( t ) Pr
(
, and construct a random trial X t R
whose success
probability Pr
(
X t R =
1
)=
Pr
(
t R )
.
Let X 1
,...,
X n be the resulting trials. Since the independent tuples and rule-
(
)
tuples in T
t
are independent and their membership probabilities vary in general,
,...,
X 1
X n are independent and have unequal success probability values. They are
Poisson trials. Let X
n
=
i = 1 X i . Then, Pr
(
T
(
t
) ,
j
)=
Pr
(
X
=
j
)(
0
j
n
)
where
Pr
(
X
=
j
)
is the probability of j successes. Thus, the probability that t is ranked the
k -th is Pr
(
t
,
k
)=
Pr
(
t
)
Pr
(
X
=
k
1
)
. Moreover, the top- k probability of t is given
by Pr k
.
X follows the Poisson binomial distribution. Therefore, Pr
(
t
)=
Pr
(
t
)
Pr
(
X
<
k
)
(
t
,
k
)
also follows the
Poisson binomial distribution, and Pr k
(
t
)
follows the cumulative distribution func-
tion of Pr
(
t
,
k
)
.
Search WWH ::




Custom Search