Database Reference
In-Depth Information
U:
L:
2 (t4)
4 (t4)
(0.8,1.0]
1
t4
0.8
U:
L:
3 (t4)
2 (t4)
(0.6,0.8]
0.6
U:
L:
3 (t4)
1 (t4)
(0.4,0.6]
0.4
0.2
U:
L:
U:
L:
2 (t4)
1 (t4)
2 (t4)
0 (t4)
(0.2,0.4]
0
0
1
2
3
4
5
Rank k
(0,0.2]
(a) Top-
k
probabilities of tuple
t
4
.
(b) Index entries.
Fig. 5.2
The index entries in
PRist
for tuple
t
4
.
To support online query answering, we need to index the top-
k
probabilities and
the
p
-ranks of tuples so that the checking operations can be conducted efficiently.
Instead of building two different indexes for top-
k
probabilities and
p
-ranks sepa-
rately, can we kill two birds with one stone?
One critical observation is that, for a tuple, the top-
k
probabilities and the
p
-ranks
can be derived from each other. We propose
PRist
, a list of probability intervals, to
store the rank information for tuples.
Example 5.4 (Indexing top-k probabilities). Let us consider how to index the uncer-
tain tuples in Table 2.2.
Figure 5.2(a) shows the top-k probabilities of tuple t
4
with respect to different
values of k. Interestingly, it can also be used to retrieve the p-rank of t
4
: for a given
probability p, we can draw a horizontal line for top-k probability
p, and then
check where the horizontal line cut the curve in Figure 5.2(a). The point right below
the horizontal line gives the answer k.
Storing the top-k probabilities for all possible k can be very costly, since k is in
range
=
where m is the number of rules in the data set. To save space, we can
divide the domain of top-k probabilities
[
1
,
m
]
into h
prob-intervals
.
In Figure 5.2(a), we partition the probability range
(
0
,
1
]
(
0
,
1
]
to
5
prob-intervals:
(
. For each interval, we record
for each tuple a lower bound and an upper bound of the ranks whose corresponding
top-k probabilities lie in the prob-interval.
For example, the top-
1
and top-
2
probabilities of t
4
are
0
0
,
0
.
2
]
,
(
0
.
2
,
0
.
4
]
,
(
0
.
4
,
0
.
6
]
,
(
0
.
6
,
0
.
8
]
, and
(
0
.
8
,
1
.
0
]
.
.
45
, respec-
tively. Therefore, a lower bound and an upper bound of ranks of t
4
in interval
0945
and
0
]
are
0
and
2
, respectively. Specifically, we define the top-
0
probability of any tuples
as
0
. Moreover, we choose rank k
(
,
.
0
0
2
=
2
as the upper bound of the
0
.
2
-rank of t
4
, since
the top-
2
probability of t
4
is greater than
0
.
2
and all top-i probabilities of t
4
for
i
<
2
are smaller than
0
.
2
. The lower bound of the
0
-rank probability of t
4
is chosen
similarly.