Top-k Typicality Queries on Uncertain Data - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

∩ A

∩ A and A

= A

∩ A . So in the rest of the

Proof. If A

0, then let A

−

∩ A

proof, we assume A

We sort the instances in A in the descending order of their typicality values,

and sort the instances in A in the descending order of their local typicality values.

Let o be the i -th instance in A , and

o be the i -th instance in A (1

≤

k ). From

Inequality 4.3, we have

h √ 2

e − σ 2

≤

(

) −

(

, σ ) <

2 h 2

and

h √ 2

e − σ 2

≤

(

) −

(

, σ ) <

2 h 2

∈ A , it holds that LT

Moreover, since o

(

, σ ) <

(

, σ )

. Thus,

(

) −

(

)=(

(

) −

(

, σ ))

− (

(

) −

(

, σ ))+(

(

, σ )

e − σ 2

h √ 2

−

(

, σ )) <

2 h 2

Inequality 4.7 follows by summing up the above difference at each rank i (1

≤

k ).

Searching the

-neighborhood for each instance can be very costly. To imple-

ment the direct local typicality approximation efficiently, we can use the VP-tree

index [179] to support

-neighborhood searches effectively.

A VP-tree [179] is a binary space partitioning (BSP) tree. Given a set of instances

O , a VP-tree T indexes the instances in O . Each node in a VP-tree represents a subset

of O . Roughly speaking, for each non-leaf node N and the set of nodes O N at N ,a

vantage point is selected to divide the set O N into two exclusive subsets O N 1

and

O N 2

( O N =

O N 1 ∪

O N 2 ) such that, to search the instances within distance

to an

instance p

N , likely we only need to search either O N 1 or O N 2 but not both. O N 1

and O N 2 are used to construct the two children of N . For example, the VP-tree in

Figure 4.2 indexes the instances in Figure 4.1.

A VP-tree can be constructed top-down starting from the root which represents

the whole set of instances. A sampling method is given in [179] to select vantage

points for internal nodes. Then, the first half subset of instances that are close to

the vantage point form the left child of the root, and the other instances form the

right child. The left and the right children are further divided recursively until a

node contains only one instance (a leaf node). A VP-tree can be constructed in cost

∈

( |

Searching a VP-tree for the

log

| )

-neighborhood of a query point is straightforward

using the recursive tree search. Once an internal node in the tree can be determined

in the

-neighborhood of the query point, all descendant instances of the inter-

nal node are in the neighborhood and no subtrees need to be searched. For exam-

ple, the

-neighborhood of node N 6 = {

}

in Figure 4.1 is represented by the

dashed circle. To find all points in the

-neighborhood of N 6 , we search the VP-tree

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home