Probabilistic Ranking Queries on Uncertain Data - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

Using Lemma 5.1, we have

e − με 2

[

] ≤

[

< (

− ε)μ ] <

[

] <

p holds if

e − με 2

≤

(5.2)

Combining Inequality 5.1 and 5.2, we get 2 ln p ≤ μ(

2 . The inequality in

−

μ )

Theorem 5.9 is the solution to the above inequality.

t )

value is monotonically increasing if tuples are

sorted in the ranking order. Using Theorem 5.9 an algorithm can stop and avoid

retrieving further tuples in the rear of the sorted list if the

Since

μ = ∑

(

, the

t ∈

(

)

value of the current

tuple satisfies the condition in Theorem 5.9.

The value of parameter k is typically set to much smaller than the number of

tuples in the whole data set. Moreover, since a user is interested in the tuples with

a high probability to be ranked in top- k , the probability threshold p is often not too

small. Consequently,

is often a small value. For example, if k

100, p

3, then

the stopping condition is

117.

In the experiments, we show in Figure 5.4 that the exact algorithm and the sam-

pling algorithm stop close to the general stopping condition. The results verify the

tightness of the stopping condition.

μ ≥

5.4.3 A Poisson Approximation Based Method

When the success probability is small and the number of Poisson trials is large, Pois-

son binomial distribution can be approximated well by Poisson distribution [185].

For a set of Poisson trials X 1 ,...,

X n such that Pr

(

X i =

p i , let X

= ∑

1 X i .

X follows a Poisson binomial distribution. Let

μ =

[

]= ∑

0 p i . The probability

k ! e − μ , where f

μ)= μ

of X

k can be approximated by Pr

(

) ≈

(

k ;

(

k ;

μ)

is the Poisson probability mass function. Thus, the probability of X

k can be

μ)= Γ ( k + 1 , μ)

k !

approximated by Pr

(

) ≈

(

k ;

, where F

(

k ;

μ)

is the cumulative

y t x − 1 e − t dt is the

upper incomplete gamma function. Theoretically, Le Cam [186] showed that the

quality of the approximation has the upper bound

distribution function corresponding to f

(

k ;

μ)

, and

Γ (

≤

k = 0 Pr ( X = k ) −

k = 0 f ( k ; μ)

sup

9 ma i {

p i }.

≤ l ≤ n

The above upper bound depends on only the maximum success probability in the

Poisson trials. In the worst case where max i {

1, the error bound is very loose.

However, our experimental results (Figure 5.6) show that the Poisson approximation

method achieves very good approximation quality in practice.

p i } =

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home