Database Reference
In-Depth Information
For an instance
o
, since the events
O
≺
o
for
O
∈O−{
O
}
are independent, we
can view
DS
as a set of independent random binary trials, where each trial
X
O
is
corresponding to an uncertain object
O
,
Pr
(
o
)
O
≺
(
X
O
=
1
)=
Pr
(
o
)
, and
Pr
(
X
O
=
1
. The event that a trial takes value 1 is called a success. Since the
probability that each trial takes value 1 is not identical, the total number of successes
in
DS
)=
1
−
Pr
(
X
O
=
1
)
(
o
)
follows the Poisson binomial distribution [103]. Thus,
Pr
(
DS
(
o
)
,
i
)
can be
computed using the Poisson binomial recurrence given in Theorem 5.2.
The cost of sorting all instances in a sliding window is
O
(
ω
(
ω))
. To com-
pute the top-
k
probability of each instance, the Poisson binomial recurrence is run
and takes cost
O
n
log
n
(
kn
)
in time. Since there are
n
ω
instances in the sliding window,
kn
2
the overall time complexity is
O
(
ω +
n
ω
log
(
n
ω))
.
Example 6.1 (Poisson binomial recurrence). Table 2.3 shows
4
uncertain streams A,
B, C, and D. For each instance, a ranking score is given. The ranking order is the
ranking score descending order: the larger the ranking score, the better the instance
is ranked.
Let us consider the sliding window W
3
(i.e., the first three columns of in-
stances in the figure), and compute the top-
2
probability of c
2
. The dominant set is
DS
(
c
2
)=
{
a
1
,
a
2
,
a
3
,
d
3
}
. Thus, p
1
=
Pr
(
A
≺
c
2
)=
Pr
(
a
1
)+
Pr
(
a
2
)+
Pr
(
a
3
)=
1
,
1
3
.
p
2
=
Pr
(
B
≺
c
2
)=
0
, and p
3
=
Pr
(
D
≺
c
2
)=
Pr
(
d
3
)=
Using Theorem 5.2, let S
1
=
{
A
}
,S
2
=
{
A
,
B
}
and S
3
=
{
A
,
B
,
D
}
. For S
1
,we
have Pr
(
S
1
,
0
)=
1
−
p
1
=
0
and Pr
(
S
1
,
1
)=
p
1
=
1
.
Fo r S
2
, we have Pr
(
S
2
,
0
)=(
1
−
p
2
)
Pr
(
S
1
,
0
)=
0
and Pr
(
S
2
,
1
)=
p
2
Pr
(
S
1
,
0
)+
(
1
−
1
.
Fo r S
3
, we have Pr
p
2
)
Pr
(
S
1
,
1
)=
(
S
3
,
0
)=(
1
−
p
3
)
Pr
(
S
2
,
0
)=
0
and Pr
(
S
3
,
1
)=
p
3
Pr
(
S
2
,
0
)+
2
3
.
(
1
−
p
3
)
Pr
(
S
2
,
1
)=
Thus, Pr
2
2
(
c
2
)=
Pr
(
c
2
)(
Pr
(
S
3
,
0
)+
Pr
(
S
3
,
1
)) =
9
.
in the ranking order, then
by one scan of the sliding window we can calculate the top-
k
probabilities for all
instance. For each stream
O
, we only need to keep the following two pieces of
information during the scan. First, we keep the number of instances in
O
that have
been scanned. Suppose there are
l
such instances, then the probability of
O
in the
Poisson recurrence is
If we sort all the instances in sliding window
W
(
O
)
l
ω
. Second, we maintain the sum of the top-
k
probabilities of
those scanned instances of
O
.
In practice, when a top-
k
query is raised,
k
n
often holds where
n
is the total
number of streams. In such a case, some streams can be pruned in the computation.
Theorem 6.1 (Pruning instances in a stream).
For an uncertain stream O, a top-
k query Q
p
with probability threshold p, and all instances in W
(
)
O
sorted in the
ranking order o
1
≺···≺
(
≤
≤
ω)
o
ω
, if there exists i
1
i
such that
i
1
j
=
−
1
Pr
k
p
−
∑
(
o
j
)
Pr
k
(
o
i
)
<
(6.1)
ω
−
i
+
1
then Pr
k
(
O
)
<
p.