Database Reference
In-Depth Information
9.2.3.1 Uncertain data streams with non-uniformly distributed instances
In Chapter 6, we assume that the membership probabilities of all instances in a slid-
ing window are identical. This is suitable for the applications where instances are
generated using simple random sampling. However, in other applications, instances
with non-identical membership probabilities may be generated. Therefore, it is im-
portant to investigate how to adapt the methods developed in Chapter 6 for the case
of non-identical membership probabilities.
In the exact algorithm, all techniques can be used except for the “compatible
dominant set” technique. The “compatible dominant set” technique reuse the domi-
nant set of an instance in the previous sliding window, as long as the number of in-
stances from each object in the dominant set does not change. This only holds when
the membership probabilities of instances are identical. Therefore, new techniques
that can reuse the dominant set of instances in the case of non-identical membership
probabilities need to be developed.
Second, the sampling method can be used without any change. We simply draw
samples according to the distribution of instances for each object in the sliding win-
dow.
Last, in order to use the space efficient algorithms developed in Chapter 6, the
φ
-quantile summary needs to be redefined. The major idea is to partition the in-
stances ranked in the value ascending order into intervals, so that the sum of mem-
bership probabilities of instances in each interval is at most
φ
. For an uncertain
object W t
ω (
O
)
containing a set of instances o 1 ,···,
o ω , where each instance o i is
associated with a membership probability Pr
(
o i )
(1
i
ω
). The instances are
ranked in the value ascending order. We partition o 1 ,···,
o ω
into b exclusive inter-
vals t i =[
o z i ,
o z i ]
, where
z i =
1
,
i
=
1;
z i 1 +
z i =
1
,
1
<
i
b ;
j
j
x = z i Pr ( o x ) φ
z i =
max
j
,
1
i
b .
z i
Therefore, the probability of each interval is at most
φ
. Moreover, there are at
1
most 2
intervals. This is because, in the worst case, each interval only contains
one instance, which means that the sum of membership probabilities of any two
consecutive instances is greater than
φ
φ
. Then, if the number of instances is greater
1
φ
than 2
, the sum of membership probabilities of all instances will be greater
than 1, which conflicts with the fact that the sum of membership probabilities of all
instances of one object is 1.
Though the approximate top- k probabilities can be computed similarly as dis-
cussed in Chapter 6, details need to be worked out as future work.
Search WWH ::




Custom Search