Probabilistic Ranking Queries on Uncertain Data - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

Definition 2.7 (Representing region). Given an uncertain object O on attributes

A 1 ,···,

A n and a subset of instances A

⊂

O , let D

D A 1 ×···×

D A n where D A i is the

domain of attribute A i (1

≤

n ), the representing region of an instance o

∈

A is

(

)= {

∈

(

min y ∈ A d

(

) }

, where d

(

)

is the distance between

objects x and y .

To make A representative as a whole, the representing region of each instance o

in A should be fairly large and o should be typical in its own representing region.

Definition 2.8 (Group typicality). Given an uncertain object O on attributes

A 1 ,···,

be the n -dimensional random vec-

tor generating the instances in O , the group typicality of A on attributes A i 1 ,···,

A n and a subset of instances A

⊂

O , let

A i l

≤

i j ≤

n ,1

≤

l )is GT

(

,X )= ∑ o ∈ A T

(

,X D ( o , A ) ) ·

(

))

, where

(

,X D ( o , A ) )

is the simple typicality of o with respect to

in o 's representing

region D

(

)

and Pr

(

))

is the probability of D

(

)

Since the distribution of

is unknown, we can estimate the group typical-

ity GT

(

,X )

as follows. For any instance o

∈

A , let N

(

)= {

∈

∩

(

) }

be the set of instances in O that lie in D

(

)

, Pr

(

))

can be esti-

mated using | N ( o , A , O ) |

| O |

. The group typicality GT

(

,X )

is estimated by GT

(

)) · | N ( o , A , O ) |

| O |

∑ o ∈ A T

(

, where T

(

))

is the estimator of simple

(

,X D ( o , A ) )

(

)

typicality T

, since N

can be viewed as a set of independent and

identically distributed samples of

The group typicality score measures how representative a group of instances is.

The size-k most typical group problem is to find k instances as a group such that the

group has the maximum group typicality. Unfortunately, the problem is NP-hard,

since it has the discrete k -median problem as a special case, which was shown to be

NP-hard [28].

Moreover, top- k queries are generally expected to have the monotonicity in an-

swer sets. That is, the result of a top- k query is contained in the result of a top- k

query where k

that lie in D

(

)

k . However, an instance in the most typical group of size k may

not be in the most typical group of size k ( k

k ). For example, in the data set illus-

trated in Figure 2.2, the size-1 most typical group is

{

}

and the size-2 most typical

group is

, which does not contain the size-1 most typical group. Therefore, the

size- k most typical group is not suitable to define the top- k representative typicality.

To enforce monotonicity, we adopt a greedy approach.

{

}

Definition 2.9 (Representative typicality). Given an uncertain object O and a re-

ported answer set A

be the random vector with respect to instances

in O , the representative typicality of an instance o

⊂

O , let

∈ (

−

)

is RT

(

,X )=

(

∪{

},X ) −

(

,X )

(

∪{

},X )

(

,X )

, where GT

and GT

are the

∪{

}

group typicality values of subsets A

and A , respectively.

In practice, we use RT

(

∪{

) −

(

)

to estimate

(

,X )

, where GT

(

)

and GT

(

∪{

)

are the estimators of GT

(

,X )

and GT

(

∪{

},X )

, respectively.

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home