Related Work - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

More recently, Cormode et al. [99] proposed to rank probabilistic tuples by ex-

pected ranks. The expected rank of a tuple t is the expectation of t 's ranks in all

possible worlds. The rank of t in a possible world W is defined as the number

of tuples ranked higher than t in W .If t does not appear in W , then its rank in

W is defined as

. Ranking by expected ranks cannot capture the semantics of

the probabilistic ranking queries discussed in this topic. For example, consider a

probabilistic table containing three tuples t 1 , t 2 and t 3 , with membership probabili-

ties 0

6, 1 and 1, respectively. Suppose the ranking order on the three tuples based

on their scores is t 1

≺

= {

}

t 2

t 3 . There are two possible worlds W 1

t 1

t 2

t 3

and

W 2

= {

t 2

t 3

}

, with probabilities 0

6 and 0

4, respectively. The expected rank of t 1 is

8. The expected ranks of t 2 and t 3 can be computed similarly,

6, respectively. A top-1 query based on expected ranks returns

t 2 as the result, since t 2 has the smallest expected rank. However, is t 2 the most likely

tuple to be ranked top-1? The top-1 probabilities of t 1 , t 2 and t 3 are 0

which are 0

6 and 1

4 and 0,

respectively. Clearly, t 1 has the highest probability to be ranked top-1. A top-

6, 0

(

)

query with k

1 returns t 1 as the result.

Li et al. [100] discussed the problem of ranking distributed probabilistic data.

The goal is to minimize the communication cost while retrieve the top- k tuples with

expected ranks from distributed probabilistic data sets. In [101], ranking in prob-

abilistic databases is modeled as a multi-criteria optimization problem. A general

ranking function of a tuple t is defined as the weighted sum of the position probabil-

ities of t . This allows users to explore the possible ranking functions space. More-

over, how to learn the weight parameters of different position probabilities from user

preference data was discussed.

1 and l

3.4.1.2 Category II: Extensions of general traditional queries

The second category is to use probability to rank answers to a query on uncertain

data. That is, given a query on uncertain data, results to the query are ranked accord-

ing to their probabilities. Such probability is called output probabilities. In [51],

Re et al. considered arbitrary SQL queries and the ranking is on the probability

that a tuple satisfies the query instead of using a ranking function. [51] and our

study address essentially different queries and applications. Meanwhile, Zhang and

Chomicki developed the global top- k semantics on uncertain data which returns k

tuples having the largest probability in the top- k list, and gave a dynamic program-

ming algorithm [33].

3.4.1.3 How Is Our Study Related?

In this topic, we study a class of ranking queries belong to Category I, probabilistic

ranking queries (including PT- k queries and top-

(

)

queries) and reverse proba-

bilistic ranking queries (including RT- k queries and top-

(

)

queries). Those queries

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home