Conclusions - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

9.2.3.1 Uncertain data streams with non-uniformly distributed instances

In Chapter 6, we assume that the membership probabilities of all instances in a slid-

ing window are identical. This is suitable for the applications where instances are

generated using simple random sampling. However, in other applications, instances

with non-identical membership probabilities may be generated. Therefore, it is im-

portant to investigate how to adapt the methods developed in Chapter 6 for the case

of non-identical membership probabilities.

In the exact algorithm, all techniques can be used except for the “compatible

dominant set” technique. The “compatible dominant set” technique reuse the domi-

nant set of an instance in the previous sliding window, as long as the number of in-

stances from each object in the dominant set does not change. This only holds when

the membership probabilities of instances are identical. Therefore, new techniques

that can reuse the dominant set of instances in the case of non-identical membership

probabilities need to be developed.

Second, the sampling method can be used without any change. We simply draw

samples according to the distribution of instances for each object in the sliding win-

dow.

Last, in order to use the space efficient algorithms developed in Chapter 6, the

-quantile summary needs to be redefined. The major idea is to partition the in-

stances ranked in the value ascending order into intervals, so that the sum of mem-

bership probabilities of instances in each interval is at most

. For an uncertain

object W t

ω (

)

containing a set of instances o 1 ,···,

o ω , where each instance o i is

associated with a membership probability Pr

(

o i )

≤

≤ ω

). The instances are

ranked in the value ascending order. We partition o 1 ,···,

o ω

into b exclusive inter-

vals t i =[

o z i ,

o z i ]

, where

⎧

⎨

⎩

z i =

z i − 1 +

z i =

≤

b ;

x = z i Pr ( o x ) ≤ φ

z i =

max

≤

b .

≥

z i

Therefore, the probability of each interval is at most

. Moreover, there are at

most 2

intervals. This is because, in the worst case, each interval only contains

one instance, which means that the sum of membership probabilities of any two

consecutive instances is greater than

. Then, if the number of instances is greater

than 2

, the sum of membership probabilities of all instances will be greater

than 1, which conflicts with the fact that the sum of membership probabilities of all

instances of one object is 1.

Though the approximate top- k probabilities can be computed similarly as dis-

cussed in Chapter 6, details need to be worked out as future work.

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home