Database Reference
In-Depth Information
Anaıve method of answering probabilistic ranking queries is to enumerate all
possible worlds and apply the query to each possible world. Then, we can compute
the top- k probability and p -rank of each tuple and select the tuples satisfying the
queries. Unfortunately, the naıve method is inefficient since, as discussed before,
there can be a huge number of possible worlds on an uncertain table. In [49], Dalvi
and Suciu showed that even the problem of counting all possible worlds with distinct
top- k lists is # P -Complete. Therefore, enumerating all possible worlds is too costly
on large uncertain data sets. That motivates our development of efficient algorithms
which avoid searching all possible worlds.
In this chapter, we first discuss the efficient top- k probability computation in
Section 5.1 and present an efficient exact algorithm in Section 5.2. Then, we de-
velop a fast sampling algorithm and a Poisson approximation based algorithm in
Sections 5.3 and 5.4, respectively. Last, to support efficient online query answering,
we propose PRist+ , a compact index, in Section 5.5. An efficient index construc-
tion algorithm and efficient query answering methods are developed for PRist+ .An
empirical study in Section 5.6 using real and synthetic data sets verifies the effec-
tiveness of probabilistic ranking queries and the efficiency of our methods.
5.1 Top- k Probability Computation
In this section, we first introduce how to compute the exact rank- k probability val-
ues. Top- k probabilities and p -ranks can be directly derived from rank- k probabili-
ties.
5.1.1 The Dominant Set Property
Hereafter, by default we consider a top- k selection query Q P , f on an uncertain table
T . P
(
)= {
|
(
)=
}
T
t
t
T
P
t
true
is the set of tuples satisfying the query predicate.
(
)
(
)
P
carries the same member-
ship probability as in T . Moreover, a generation rule R in T is projected to P
T
is also an uncertain table where each tuple in P
T
(
T
)
by
removing all tuples from R that are not in P ( T )
.
contains all tuples satisfying the query, as well as the membership proba-
bilities and the generation rules. Removing tuples not in P
P
(
T
)
(
T
)
does not affect the
rank- k probabilities of the tuples in P
)
in computing rank- k probabilities. To keep our discuss simple, we use T to denote
the set of tuples satisfying query predicate P .
For a tuple t
(
T
)
. Therefore, we only need to consider P
(
T
T and a possible world W such that t
W , whether t
W f (
k
)
depends only on how many other tuples in T ranked higher than t appear in W .
Definition 5.1 (Dominant set). Given a scoring function f on a probabilistic table
T , for a tuple t
T , the dominant set of t is the subset of tuples in T that are ranked
higher than t , i.e., S t = {
t |
t
t f t
T
}
.
Search WWH ::




Custom Search