Database Reference
In-Depth Information
Step 3: Evaluating
If, before prob-interval
b
i
is scanned,
B
has
l
tuples such that
l
<
l
, but after
b
i
is scanned,
B
has
l
or more tuples, we enter the evaluating step.
If
B
has exactly
l
tuples, we do not need to do anything. However, if
B
has more
than
l
tuples, we need to evaluate the exact top-
k
probabilities of those tuples added
to
B
from
b
i
, and include into
B
only the top-
l
)
(
l
−
tuples of the largest top-
k
probabilities.
When a tuple can be added into the answer set without the evaluating step, the
time to retrieve the tuple is constant. There are at most
l
such tuples. Only the top-
k
probabilities of the tuples in the last prob-interval need to be evaluated. Let
d
be
the number of such tuples. Then, the time complexity of evaluating those tuples is
O
(
kmd
)
, as computing the top-
k
probabilities of those tuples requires
O
(
kmd
)
time.
The overall time complexity of query answering is
O
(
l
+
kmd
)
.
5.5.2.3 Answering Top-
(
p
,
l
)
Queries
Example 5.8 (Answering top-
queries). Consider the uncertain tuples in Fig-
ure 5.3 again. How can we answer the top-
(
p
,
l
)
(
p
,
l
)
query with p
=
0
.
5
and l
=
2
?
We only need to check prob-interval b
3
=(
0
.
4
,
0
.
6
]
that contains p. First, the
top-
2
highest ranks in b
3
.
U
3
are
3
for tuples t
3
and t
4
, which means MR
0
.
5
(
t
3
)
≤
3
and MR
0
.
5
(
t
4
)
≤
3
. Therefore, we can set
3
as the upper bound of the
0
.
5
-ranks, and
any tuple t with MR
0
.
5
>
3
cannot be the answer to the query.
Second, we scan list b
3
.
L
3
from the beginning. t
1
is added into a buffer B, since
t
1
.
3
and MR
0
.
5
might be smaller than or equal to
3
.t
3
and t
4
are added into
B due to the same reason. Then, when scanning t
2
, we find t
2
.
L
3
<
L
3
=
4
, which means
MR
0
.
5
(
4
. Thus, t
2
and any tuples following t
2
in the list (in this
example, no other tuples) can be pruned.
Last, the
0
t
2
)
≥
MR
0
.
4
(
t
2
)
≥
5
-ranks of the tuples in B (i.e., t
1
,t
3
, and t
4
) are calculated, and the
tuples with the top-
2
highest
0
.
.
5
-ranks are returned as the answer to the query,
which are t
1
and t
3
.
query, the three steps work as follows.
Step 1: Bounding
The
p
-rank of a tuple can be bounded using the following rule.
To answer a top-
(
p
,
l
)
Theorem 5.12 (Bounding
p
-ranks).
Let T be a set of uncertain tuples indexed by
PRist
with granularity parameter h. For a tuple t
∈
S and p
∈
(
0
,
1
]
, let b
i
(
1
≤
i
≤
h)
i
−
1
h
<
i
h
. Then, t
be the prob-interval such that p
∈
b
i
, i.e.,
p
≤
.
L
i
≤
MR
p
(
t
)
≤
t
.
U
i
.
Proof.
To prove Theorem 5.12, we need the following two lemmas.
Lemma 5.2 (Monotonicity).
Let t be an uncertain tuple, and p
1
and p
2
be two real
values in
(
0
,
1
]
.Ifp
1
≤
p
2
, then MR
p
1
(
t
)
≤
MR
p
2
(
t
)
.
k
1
. Then,
Pr
k
1
k
1
,
Pr
k
Proof.
Let
MR
p
1
(
t
)=
(
t
)
≥
p
1
. Moreover, for any
k
<
(
t
)
<
p
1
.
k
1
,
Pr
k
Since
p
1
≤
p
2
, for any
k
<
(
t
)
<
p
2
. Therefore,
MR
p
2
(
t
)
≥
k
1
=
MR
p
1
(
t
)
.
Lemma 5.3.
Let t be an uncertain tuple, k be a positive integer, and p be a real
values in
.IfPr
k
(
0
,
1
]
(
t
)
≤
p, then MR
p
(
t
)
≥
k.
Proof.
Since
Pr
k
k
,
Pr
x
(
t
)
≤
p
, for any
x
≤
(
t
)
≤
p
. Thus,
MR
p
(
t
)
≥
k
.