Database Reference
In-Depth Information
For an instance o , since the events O
o for O ∈O−{
O
}
are independent, we
can view DS
as a set of independent random binary trials, where each trial X O is
corresponding to an uncertain object O , Pr
(
o
)
O
(
X O =
1
)=
Pr
(
o
)
, and Pr
(
X O =
1
. The event that a trial takes value 1 is called a success. Since the
probability that each trial takes value 1 is not identical, the total number of successes
in DS
)=
1
Pr
(
X O =
1
)
(
o
)
follows the Poisson binomial distribution [103]. Thus, Pr
(
DS
(
o
) ,
i
)
can be
computed using the Poisson binomial recurrence given in Theorem 5.2.
The cost of sorting all instances in a sliding window is O
(
ω
(
ω))
. To com-
pute the top- k probability of each instance, the Poisson binomial recurrence is run
and takes cost O
n
log
n
(
kn
)
in time. Since there are n
ω
instances in the sliding window,
kn 2
the overall time complexity is O
(
ω +
n
ω
log
(
n
ω))
.
Example 6.1 (Poisson binomial recurrence). Table 2.3 shows 4 uncertain streams A,
B, C, and D. For each instance, a ranking score is given. The ranking order is the
ranking score descending order: the larger the ranking score, the better the instance
is ranked.
Let us consider the sliding window W 3 (i.e., the first three columns of in-
stances in the figure), and compute the top- 2 probability of c 2 . The dominant set is
DS
(
c 2 )= {
a 1 ,
a 2 ,
a 3 ,
d 3 }
. Thus, p 1 =
Pr
(
A
c 2 )=
Pr
(
a 1 )+
Pr
(
a 2 )+
Pr
(
a 3 )=
1 ,
1
3 .
p 2 =
Pr
(
B
c 2 )=
0 , and p 3 =
Pr
(
D
c 2 )=
Pr
(
d 3 )=
Using Theorem 5.2, let S 1 = {
A
}
,S 2 = {
A
,
B
}
and S 3 = {
A
,
B
,
D
}
. For S 1 ,we
have Pr
(
S 1 ,
0
)=
1
p 1 =
0 and Pr
(
S 1 ,
1
)=
p 1 =
1 .
Fo r S 2 , we have Pr
(
S 2 ,
0
)=(
1
p 2 )
Pr
(
S 1 ,
0
)=
0 and Pr
(
S 2 ,
1
)=
p 2 Pr
(
S 1 ,
0
)+
(
1
1 .
Fo r S 3 , we have Pr
p 2 )
Pr
(
S 1 ,
1
)=
(
S 3 ,
0
)=(
1
p 3 )
Pr
(
S 2 ,
0
)=
0 and Pr
(
S 3 ,
1
)=
p 3 Pr
(
S 2 ,
0
)+
2
3 .
(
1
p 3 )
Pr
(
S 2 ,
1
)=
Thus, Pr 2
2
(
c 2 )=
Pr
(
c 2 )(
Pr
(
S 3 ,
0
)+
Pr
(
S 3 ,
1
)) =
9 .
in the ranking order, then
by one scan of the sliding window we can calculate the top- k probabilities for all
instance. For each stream O , we only need to keep the following two pieces of
information during the scan. First, we keep the number of instances in O that have
been scanned. Suppose there are l such instances, then the probability of O in the
Poisson recurrence is
If we sort all the instances in sliding window W
( O )
l
ω
. Second, we maintain the sum of the top- k probabilities of
those scanned instances of O .
In practice, when a top- k query is raised, k
n often holds where n is the total
number of streams. In such a case, some streams can be pruned in the computation.
Theorem 6.1 (Pruning instances in a stream). For an uncertain stream O, a top-
k query Q p with probability threshold p, and all instances in W
(
)
O
sorted in the
ranking order o 1
≺···≺
(
ω)
o ω , if there exists i
1
i
such that
i
1
j =
1 Pr k
p
(
o j )
Pr k
(
o i
) <
(6.1)
ω
i
+
1
then Pr k
(
O
) <
p.
 
Search WWH ::




Custom Search