Database Reference
In-Depth Information
To use Poisson approximation to evaluate a top- k query Q k
, we scan the tuples
in T in the ranking order. The sum of membership probabilities of the scanned tuples
is maintained in
(
f
)
. Moreover, for each generation rule R , let R left be the set of tuples
in R that are already scanned. Correspondingly, let
μ
μ R be the sum of membership
probabilities of the tuples in R left .
When a tuple t is scanned, if t is an independent tuple, then the top- k probability
of t can be estimated using Pr
) Γ ( k , μ)
(
! .If t belongs to a gener-
ation rule R , then the top- k probability of t can be estimated by Pr
(
t
)
F
(
k
1;
μ)=
Pr
(
t
k
1
)
μ )=
(
t
)
F
(
k
1;
) Γ ( k , μ )
μ = μ μ R . t is output if the estimated probability Pr k
Pr
passes
the probability threshold p . The scan stops when the general stopping condition in
Theorem 5.9 is satisfied.
In the Poisson approximation based method, we need to maintain the running
(
t
( k 1 ) ! , where
(
t
)
μ
and
μ R for each open rule R . Thus, the space requirement of the Poisson approx-
imation based method is O
( |R| +
1
)
, where
R
is the set of generation rules. The
, where n is the number of tuples read before the general
stopping condition is satisfied, which depends on parameter k , probability threshold
p and the probability distribution of the tuples and is independent from the size of
the uncertain table.
n )
time complexity is O
(
5.5 Online Query Answering
Since probabilistic ranking queries involve several parameters, a user may be in-
terested in how query results change as parameters vary. To support the interactive
analysis, online query answering is highly desirable.
In this section, we develop PRist+ (for p robabilistic r anking l ist s), an index for
online answering probabilistic ranking queries on uncertain data, which is compact
in space and efficient in construction.
We first describe the index data structure PRist and present a simple construction
algorithm. Then, we develop PRist+ , an advanced version of PRist that can be con-
structed more efficiently and achieve almost as good performance as PRist in query
answering.
5.5.1 The PRist Index
To answer probabilistic ranking queries, for a tuple t
T , a rank parameter k and a
probability threshold p , we often need to conduct the following two types of check-
ing operations.
Top-k probability checking : is the top- k probability of t at least p ?
p-rank checking : is the p -rank of t at most k ?
Search WWH ::




Custom Search