Database Reference
In-Depth Information
5.4 A Poisson Approximation Based Method
In this section, we further analyze the properties of top-
k
probability from the statis-
tics aspect, and derive a general stopping condition for query answering algorithms
which depends on parameter
k
and threshold
p
only and is independent from data
set size. We also devise an approximation method based on the Poisson approxima-
tion. Since the PT-
k
query answering methods can be extended to evaluate top-
(
,
)
k
l
(
,
)
queries and top-
queries as discussed in Section 5.2.1, the Poisson approxima-
tion based method can be used to answer top-
p
l
(
k
,
l
)
queries and top-
(
p
,
l
)
queries
too. We omit the details to avoid redundance.
5.4.1 Distribution of Top-k Probabilities
Let
X
1
,...,
X
n
be a set of independent random variables, such that
Pr
(
X
i
=
1
)=
p
i
i
i
and
Pr
(
X
i
=
0
)=
1
−
p
i
(
1
≤
i
≤
n
)
. Let
X
=
∑
1
X
i
. Then,
E
[
X
]=
∑
1
p
i
.Ifall
=
=
X
n
are called
Bernoulli trials
, and
X
follows a
binomial
distribution
; otherwise,
X
1
,...,
p
i
's are identical,
X
1
,...,
X
n
are called
Poisson trials
, and
X
follows a
Poisson
binomial distribution
.
For a tuple
t
k
j
T
, the top-
k
probability of
t
is
Pr
k
∈
(
t
)=
Pr
(
t
)
∑
1
Pr
(
T
(
t
)
,
j
−
1
)
,
=
where
Pr
is the compressed domi-
nant set of
t
. Moreover, the probability that fewer than
k
tuples appear in
T
(
t
)
is the membership probability of
t
,
T
(
t
)
(
t
)
is
k
j
∑
.
If there is any tuple or generation rule-tuple in
T
1
Pr
(
T
(
t
)
,
j
−
1
)
=
(
t
)
with probability 1, we can
remove the tuple from
T
probability of
t
. Thus, we
can assume that the membership probability of any tuple or rule-tuple in
T
(
t
)
, and compute the top-
(
k
−
1
)
(
t
)
is
smaller than 1.
To compute
Pr
k
(
)
(
)
t
, we construct a set of Poisson trials corresponding to
T
t
as
follows. For each independent tuple
t
∈
(
)
T
t
, we construct a random trial
X
t
whose
t
)
success probability
Pr
(
X
t
=
1
)=
Pr
(
. For each multi-tuple generation rule
R
⊕
(
R
⊕
∩
T
(
t
)
=
0), we combine the tuples in
R
⊕
∩
T
(
t
)
into a rule-tuple
t
R
⊕
such
t
)
that
Pr
(
t
R
⊕
)=
∑
t
∈
R
⊕
∩
T
(
t
)
Pr
(
, and construct a random trial
X
t
R
⊕
whose success
probability
Pr
(
X
t
R
⊕
=
1
)=
Pr
(
t
R
⊕
)
.
Let
X
1
,...,
X
n
be the resulting trials. Since the independent tuples and rule-
(
)
tuples in
T
t
are independent and their membership probabilities vary in general,
,...,
X
1
X
n
are independent and have unequal success probability values. They are
Poisson trials. Let
X
n
=
∑
i
=
1
X
i
. Then,
Pr
(
T
(
t
)
,
j
)=
Pr
(
X
=
j
)(
0
≤
j
≤
n
)
where
Pr
(
X
=
j
)
is the probability of
j
successes. Thus, the probability that
t
is ranked the
k
-th is
Pr
(
t
,
k
)=
Pr
(
t
)
Pr
(
X
=
k
−
1
)
. Moreover, the top-
k
probability of
t
is given
by
Pr
k
.
X
follows the Poisson binomial distribution. Therefore,
Pr
(
t
)=
Pr
(
t
)
Pr
(
X
<
k
)
(
t
,
k
)
also follows the
Poisson binomial distribution, and
Pr
k
(
t
)
follows the cumulative distribution func-
tion of
Pr
(
t
,
k
)
.