Database Reference
In-Depth Information
More recently, Cormode
et al.
[99] proposed to rank probabilistic tuples by ex-
pected ranks. The expected rank of a tuple
t
is the expectation of
t
's ranks in all
possible worlds. The rank of
t
in a possible world
W
is defined as the number
of tuples ranked higher than
t
in
W
.If
t
does not appear in
W
, then its rank in
W
is defined as
. Ranking by expected ranks cannot capture the semantics of
the probabilistic ranking queries discussed in this topic. For example, consider a
probabilistic table containing three tuples
t
1
,
t
2
and
t
3
, with membership probabili-
ties 0
|
W
|
6, 1 and 1, respectively. Suppose the ranking order on the three tuples based
on their scores is
t
1
.
≺
≺
=
{
,
,
}
t
2
t
3
. There are two possible worlds
W
1
t
1
t
2
t
3
and
W
2
=
{
t
2
,
t
3
}
, with probabilities 0
.
6 and 0
.
4, respectively. The expected rank of
t
1
is
0
×
0
.
6
+
2
×
0
.
4
=
0
.
8. The expected ranks of
t
2
and
t
3
can be computed similarly,
6, respectively. A top-1 query based on expected ranks returns
t
2
as the result, since
t
2
has the smallest expected rank. However, is
t
2
the most likely
tuple to be ranked top-1? The top-1 probabilities of
t
1
,
t
2
and
t
3
are 0
which are 0
.
6 and 1
.
4 and 0,
respectively. Clearly,
t
1
has the highest probability to be ranked top-1. A top-
.
6, 0
.
(
k
,
l
)
query with
k
1 returns
t
1
as the result.
Li
et al.
[100] discussed the problem of ranking distributed probabilistic data.
The goal is to minimize the communication cost while retrieve the top-
k
tuples with
expected ranks from distributed probabilistic data sets. In [101], ranking in prob-
abilistic databases is modeled as a multi-criteria optimization problem. A general
ranking function of a tuple
t
is defined as the weighted sum of the position probabil-
ities of
t
. This allows users to explore the possible ranking functions space. More-
over, how to learn the weight parameters of different position probabilities from user
preference data was discussed.
=
1 and
l
=
3.4.1.2 Category II: Extensions of general traditional queries
The second category is to use probability to rank answers to a query on uncertain
data. That is, given a query on uncertain data, results to the query are ranked accord-
ing to their probabilities. Such probability is called output probabilities. In [51],
Re
et al.
considered arbitrary SQL queries and the ranking is on the probability
that a tuple satisfies the query instead of using a ranking function. [51] and our
study address essentially different queries and applications. Meanwhile, Zhang and
Chomicki developed the global top-
k
semantics on uncertain data which returns
k
tuples having the largest probability in the top-
k
list, and gave a dynamic program-
ming algorithm [33].
3.4.1.3 How Is Our Study Related?
In this topic, we study a class of ranking queries belong to Category I, probabilistic
ranking queries (including PT-
k
queries and top-
(
k
,
l
)
queries) and reverse proba-
bilistic ranking queries (including RT-
k
queries and top-
(
r
,
l
)
queries). Those queries