Related Work - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

have been defined in Chapter 2. Among them, reverse probabilistic ranking queries

have not been considered by any previous studies.

We proposed [30, 31] probability threshold top- k queries that find the tuples

whose probabilities of being ranked top- k are at least p , where p is a user specified

probability threshold. Probabilistic threshold top- k queries bear different semantics

from U-Top k queries and U- K Ranks queries. Consider the following example. If

there are three sensors A , B and C deployed at different locations. At time 9 AM ,

three records r A , r B , and r C are reported from those sensors with associated confi-

dences: r A

(

110 km

h with Pr

r A

1, r B

100 km

h with Pr

r B

4, and

r C

90 km

h with Pr

(

r C

8. What are the top-2 speeding locations?

•

A U-Top2 query reports C as the answer, since

r C

is the most probably top-2

list in all possible worlds, whose probability is 0

432.

•

A U-2Ranks query reports C as the most probably 1-st speeding location with

confidence 0

432. For the 2-nd speeding location, C is reported again with confi-

dence 0

288.

•

A probabilistic threshold top-2 query with probability threshold p

3 returns

B and C as the 2 locations whose top-2 probabilities are no smaller than p . Their

top-2 probabilities are Pr 2

4 and Pr 2

(

r B )=

(

r C )=

72.

Therefore, location B has a probability of 0

4 of being ranked in the top-2 speeding

locations. But it cannot be reported by U-Top k queries or U- k Ranks queries. In the

speed monitoring application, users are more interested in the individual locations

with a high probability of being ranked top- k . The co-occurrence of speeding loca-

tions in the top- k list or the speeding location at certain ranking position may not

be important. Therefore, probabilistic threshold top- k queries are more appropriate

than U-Top k queries and U- k Ranks queries in this application scenario.

Moreover, we develop efficient query answering algorithms and effective index

structures for the proposed queries. Our unique prefix sharing technique and three

pruning techniques can greatly improve the efficiency in query answering. It is worth

noting that our algorithm can be used to answer U- K Ranks query straightforwardly,

while their algorithm may not be used to handle PT- k query directly.

3.4.2 Poisson Approximation

The problem of answering probability ranking queries is also related to Poisson

trials and the Poisson binomial distribution in probability and statistics.

A Poisson trial is an experiment whose outcome is randomly selected from two

possible outcomes, “success” or “failure”. Let X 1 ,...,

X n be n independent Bernoulli

random trials. For each trial X i , the success probability is p i (1

n ). The exper-

iment is called Bernoulli trials if the trials are identical and thus the success prob-

ability of all the trials are the same. Otherwise, the experiment is called Poisson

trials [102].

≤

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home