Top-k Typicality Queries on Uncertain Data - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

4.2 Local Typicality Approximation

While the randomized tournament method is efficient, it does not have formally

provable accuracy. Can we provide some quality guarantee and at the same time

largely retain the efficiency? In this section, we develop several heuristic local typi-

cality approximation methods. Our discussion in this section is for simple typicality.

The methods will be extended to other typicality measures later in this chapter.

4.2.1 Locality of Typicality Approximation

In Gaussian kernel estimation, given two instances a and p in an uncertain object O ,

2 h 2 ,

where n is the cardinality of the uncertain object O . The contribution of p decays

exponentially as the distance between a and p increases. Therefore, if p is remote

from a , p contributes very little to the simple typicality score of a .

Moreover, in a metric space, given three instances a , b and p , the triangle inequal-

(

)

e −

nh √ 2

the contribution from p to T

(

)

, the simple typicality score of a ,is

ity

Therefore, the instances far away from a and b will have similar contributions to the

probability density values T

(

) −

(

) | <

(

)

holds. If d

(

)

(

)

, then d

(

) ≈

(

)

Based on the above observations, given an uncertain object O and a subset C

(

)

and T

(

)

⊆

O , can we use the locality to approximate the instance having the largest simple

typicality value in C ?

Definition 4.1 (Neighborhood region). Given an uncertain object O , a neighbor-

hood threshold

, and a subset C

⊆

O , let D

D A 1 ×···×

D A n where D A i

is the

domain of attribute A i (1

≤

n ), the

-neighborhood region of C is defined as

o ) }≤ σ }

(

, σ)= {

∈

min o ∈ C {

(

Definition 4.2 (Local simple typicality). Given an uncertain object O , a neigh-

borhood threshold

be the random vector that gen-

erates samples O , the local simple typicality of an object o

, and a subset C

⊆

O , let

∈

C is defined as

(

,X , σ)=

(

|X D ( C , σ ) )

where L

(

|X D ( C , σ ) )

is the likelihood of o given

that it is a sample of

in region D

(

, σ )

In practice, for each instance o

∈

O , we use the set of instances in O that lie in

o 's

-neighborhood region to estimate the simple typicality of o .

Definition 4.3 (Local neighborhood). Given an uncertain object O , a neighbor-

hood threshold

, and a subset C

⊆

O , The

-neighborhood of C is defined as

(

, σ)= {

∈

∩

(

, σ ) }

, where D

(

, σ )

is the

-neighborhood region

of C .

(

, σ )

is the set of instances in O whose distance to at least one instance

in C is at most

. Then, LT

(

,X , σ )

can be estimated using LT

(

, σ)=

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home