Database Reference
In-Depth Information
Chapter 5
Probabilistic Ranking Queries on Uncertain
Data
In this chapter, we discuss how to answer probabilistic ranking queries defined in
Section 2.2.2 on the probabilistic database model. The techniques developed in this
chapter will also be used in Chapter 6.
Given a probabilistic table T with a set of generation rules
and a top- k selection
query Q P , f , where P is a predicate, f is a scoring function, and k is a positive integer,
the rank- k probability of a tuple t
R
T is the probability that t is ranked at the k -th
position in possible worlds according to Q P , f , that is
W ∈W s.t. t = W f ( k )
Pr
(
t
,
k
)=
Pr
(
W
)
where W f (
denotes the tuple ranked at the k -th position in W .
Moreover, the top- k probability of t is the probability that t is ranked top- k in
possible worlds according to f , that is,
k
)
k
j = 1 Pr ( t , j )
Pr k
(
t
)=
Last, the p-rank of t is the minimum k such that Pr k
(
)
t
p , denoted by
Pr k
MR p
(
)=
{
|
(
)
}
t
min
k
t
p
Four types of probabilistic ranking queries are developed.
A probabilistic threshold top- k query ( PT-k query for short) [30, 31] finds the
tuples whose top- k probabilities are at least a probability threshold p
(
0
<
p
1
)
.
A rank threshold query ( RT-k query for short) retrieves the tuples whose p -
ranks are at most k .RT- k queries are reverse queries of PT- k queries.
A top-
(
k
,
l
)
query [32, 33] finds the top- l tuples with the highest top- k probabil-
ities
(
l
>
0
)
.
A top-
(
p
,
l
)
query returns the top- l tuples with the smallest p -ranks. Top-
(
p
,
l
)
queries are reverse queries of top-
(
k
,
l
)
queries.
89
Search WWH ::




Custom Search