Database Reference
In-Depth Information
More recently, Cormode et al. [99] proposed to rank probabilistic tuples by ex-
pected ranks. The expected rank of a tuple t is the expectation of t 's ranks in all
possible worlds. The rank of t in a possible world W is defined as the number
of tuples ranked higher than t in W .If t does not appear in W , then its rank in
W is defined as
. Ranking by expected ranks cannot capture the semantics of
the probabilistic ranking queries discussed in this topic. For example, consider a
probabilistic table containing three tuples t 1 , t 2 and t 3 , with membership probabili-
ties 0
|
W
|
6, 1 and 1, respectively. Suppose the ranking order on the three tuples based
on their scores is t 1
.
= {
,
,
}
t 2
t 3 . There are two possible worlds W 1
t 1
t 2
t 3
and
W 2
= {
t 2
,
t 3
}
, with probabilities 0
.
6 and 0
.
4, respectively. The expected rank of t 1 is
0
×
0
.
6
+
2
×
0
.
4
=
0
.
8. The expected ranks of t 2 and t 3 can be computed similarly,
6, respectively. A top-1 query based on expected ranks returns
t 2 as the result, since t 2 has the smallest expected rank. However, is t 2 the most likely
tuple to be ranked top-1? The top-1 probabilities of t 1 , t 2 and t 3 are 0
which are 0
.
6 and 1
.
4 and 0,
respectively. Clearly, t 1 has the highest probability to be ranked top-1. A top-
.
6, 0
.
(
k
,
l
)
query with k
1 returns t 1 as the result.
Li et al. [100] discussed the problem of ranking distributed probabilistic data.
The goal is to minimize the communication cost while retrieve the top- k tuples with
expected ranks from distributed probabilistic data sets. In [101], ranking in prob-
abilistic databases is modeled as a multi-criteria optimization problem. A general
ranking function of a tuple t is defined as the weighted sum of the position probabil-
ities of t . This allows users to explore the possible ranking functions space. More-
over, how to learn the weight parameters of different position probabilities from user
preference data was discussed.
=
1 and l
=
3.4.1.2 Category II: Extensions of general traditional queries
The second category is to use probability to rank answers to a query on uncertain
data. That is, given a query on uncertain data, results to the query are ranked accord-
ing to their probabilities. Such probability is called output probabilities. In [51],
Re et al. considered arbitrary SQL queries and the ranking is on the probability
that a tuple satisfies the query instead of using a ranking function. [51] and our
study address essentially different queries and applications. Meanwhile, Zhang and
Chomicki developed the global top- k semantics on uncertain data which returns k
tuples having the largest probability in the top- k list, and gave a dynamic program-
ming algorithm [33].
3.4.1.3 How Is Our Study Related?
In this topic, we study a class of ranking queries belong to Category I, probabilistic
ranking queries (including PT- k queries and top-
(
k
,
l
)
queries) and reverse proba-
bilistic ranking queries (including RT- k queries and top-
(
r
,
l
)
queries). Those queries
Search WWH ::




Custom Search