Database Reference
In-Depth Information
have been defined in Chapter 2. Among them, reverse probabilistic ranking queries
have not been considered by any previous studies.
We proposed [30, 31] probability threshold top- k queries that find the tuples
whose probabilities of being ranked top- k are at least p , where p is a user specified
probability threshold. Probabilistic threshold top- k queries bear different semantics
from U-Top k queries and U- K Ranks queries. Consider the following example. If
there are three sensors A , B and C deployed at different locations. At time 9 AM ,
three records r A , r B , and r C are reported from those sensors with associated confi-
dences: r A
=
/
(
)=
.
=
/
(
)=
.
110 km
h with Pr
r A
0
1, r B
100 km
h with Pr
r B
0
4, and
r C
=
90 km
/
h with Pr
(
r C
)=
0
.
8. What are the top-2 speeding locations?
A U-Top2 query reports C as the answer, since
r C
is the most probably top-2
list in all possible worlds, whose probability is 0
.
432.
A U-2Ranks query reports C as the most probably 1-st speeding location with
confidence 0
.
432. For the 2-nd speeding location, C is reported again with confi-
dence 0
.
288.
A probabilistic threshold top-2 query with probability threshold p
3 returns
B and C as the 2 locations whose top-2 probabilities are no smaller than p . Their
top-2 probabilities are Pr 2
=
0
.
4 and Pr 2
(
r B )=
0
.
(
r C )=
0
.
72.
Therefore, location B has a probability of 0
4 of being ranked in the top-2 speeding
locations. But it cannot be reported by U-Top k queries or U- k Ranks queries. In the
speed monitoring application, users are more interested in the individual locations
with a high probability of being ranked top- k . The co-occurrence of speeding loca-
tions in the top- k list or the speeding location at certain ranking position may not
be important. Therefore, probabilistic threshold top- k queries are more appropriate
than U-Top k queries and U- k Ranks queries in this application scenario.
Moreover, we develop efficient query answering algorithms and effective index
structures for the proposed queries. Our unique prefix sharing technique and three
pruning techniques can greatly improve the efficiency in query answering. It is worth
noting that our algorithm can be used to answer U- K Ranks query straightforwardly,
while their algorithm may not be used to handle PT- k query directly.
.
3.4.2 Poisson Approximation
The problem of answering probability ranking queries is also related to Poisson
trials and the Poisson binomial distribution in probability and statistics.
A Poisson trial is an experiment whose outcome is randomly selected from two
possible outcomes, “success” or “failure”. Let X 1 ,...,
X n be n independent Bernoulli
random trials. For each trial X i , the success probability is p i (1
n ). The exper-
iment is called Bernoulli trials if the trials are identical and thus the success prob-
ability of all the trials are the same. Otherwise, the experiment is called Poisson
trials [102].
i
Search WWH ::




Custom Search