Probabilistic Path Queries on Road Networks - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

x 5

x 4

x 5

x 4

( )

x 3

( )

x 2

( )

x 1

( )

x 1

... ... ...

x t

x 1

x t

... ... ...

w em

The samples of

w em

(a) The upper bound.

(b) The lower bound.

Fig. 8.3 The upper/lower bound of F P m + 1 ( l ) .

The number of buckets is at most 2 t , as shown in the following lemma.

(

)

Lemma 8.2 (Number of buckets). Given w P m with n samples

2 t

, let

[

x z i ,

x z i ](

≤

)

≤

2 t.

Proof. In the worst case, each bucket only contains one value in w P m , which means

that the probability sum of any two consecutive values in w P m

be b exclusive buckets satisfying Equation 8.3, then b

is greater than

t .

Then, if the number of values in w P m

is greater than 2 t , the probability sum of all

values in w P m

will be greater than 1, which conflicts with the fact that w P m

is a

discrete random variable.

Constructing the buckets only requires one scan of all values in w P m . The min-

imal value in bucket

φ i =[

x z i ,

x z i ]

is min

(φ i )=

x z i , and the maximal value in

φ i is

max

(φ i )=

x z i . When computing the probability distribution of w P m + 1

using w P m ,we

only select one value in each bucket

φ i ⊂

w P m as a representative, and assign Pr

(φ i )

to the representative. If min

(φ i )

is selected as a representative, then the so computed

F P m + 1 (

)

is greater than F P m + 1 (

)

; if max

(φ i )

is used as a representative, then the so

computed F P m + 1 (

)

is smaller than F P m + 1 (

)

Example 8.2 (Bucket Approximation). Consider path P m and edge e m . Let

f P m | e m (

≤

x j

2 for 1

5 .Ift

2 , then all values in w P m are divided into

three buckets:

[

x 1

] , [

]

2 , respectively.

If the minimal value of each bucket is used as a representative, then x 1 ,x 3 and

x 5 are selected. As a result, the l-weight constrained region of P m + 1 is increased

such that the shaded area is included. So the calculated F P m + 1 (

x 2

x 3

x 4

x 5

, with probability 0

4 , 0

4 and 0

)

is greater than the

actual F P m + 1 (

)

If the maximal value of each bucket is used as a representative, then x 2 ,x 4 and

x 5 are selected. The l-weight constrained region of P m + 1 is decreased such that

the shaded area is excluded. So the calculated F P m + 1 (

)

is smaller than the actual

F P m + 1 (

)

Therefore, the average value of F P m + 1 (

and F P m + 1 (

)

can be used to approxi-

mate the actual F P m + 1 (

)

. The approximation quality is guaranteed by the following

lemma.

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home