Database Reference
In-Depth Information
U:
L: 2 (t4)
4 (t4)
(0.8,1.0]
1
t4
0.8
U:
L:
3 (t4)
2 (t4)
(0.6,0.8]
0.6
U:
L:
3 (t4)
1 (t4)
(0.4,0.6]
0.4
0.2
U:
L:
U:
L:
2 (t4)
1 (t4)
2 (t4)
0 (t4)
(0.2,0.4]
0
0
1
2
3
4
5
Rank k
(0,0.2]
(a) Top- k probabilities of tuple t 4 .
(b) Index entries.
Fig. 5.2 The index entries in PRist for tuple t 4 .
To support online query answering, we need to index the top- k probabilities and
the p -ranks of tuples so that the checking operations can be conducted efficiently.
Instead of building two different indexes for top- k probabilities and p -ranks sepa-
rately, can we kill two birds with one stone?
One critical observation is that, for a tuple, the top- k probabilities and the p -ranks
can be derived from each other. We propose PRist , a list of probability intervals, to
store the rank information for tuples.
Example 5.4 (Indexing top-k probabilities). Let us consider how to index the uncer-
tain tuples in Table 2.2.
Figure 5.2(a) shows the top-k probabilities of tuple t 4 with respect to different
values of k. Interestingly, it can also be used to retrieve the p-rank of t 4 : for a given
probability p, we can draw a horizontal line for top-k probability
p, and then
check where the horizontal line cut the curve in Figure 5.2(a). The point right below
the horizontal line gives the answer k.
Storing the top-k probabilities for all possible k can be very costly, since k is in
range
=
where m is the number of rules in the data set. To save space, we can
divide the domain of top-k probabilities
[
1
,
m
]
into h prob-intervals .
In Figure 5.2(a), we partition the probability range
(
0
,
1
]
(
0
,
1
]
to 5 prob-intervals:
(
. For each interval, we record
for each tuple a lower bound and an upper bound of the ranks whose corresponding
top-k probabilities lie in the prob-interval.
For example, the top- 1 and top- 2 probabilities of t 4 are 0
0
,
0
.
2
]
,
(
0
.
2
,
0
.
4
]
,
(
0
.
4
,
0
.
6
]
,
(
0
.
6
,
0
.
8
]
, and
(
0
.
8
,
1
.
0
]
.
.
45 , respec-
tively. Therefore, a lower bound and an upper bound of ranks of t 4 in interval
0945 and 0
]
are 0 and 2 , respectively. Specifically, we define the top- 0 probability of any tuples
as 0 . Moreover, we choose rank k
(
,
.
0
0
2
=
2 as the upper bound of the 0
.
2 -rank of t 4 , since
the top- 2 probability of t 4 is greater than 0
.
2 and all top-i probabilities of t 4 for
i
<
2 are smaller than 0
.
2 . The lower bound of the 0 -rank probability of t 4 is chosen
similarly.
 
Search WWH ::




Custom Search