Top-k Typicality Queries on Uncertain Data - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

in Figure 4.2 from the root node N 1 , and recursively examine each internal node.

During the search, node N 4 can be pruned since all points in N 4 lie out of the

neighborhood of N 6 .

The

cost

computing

the

local

simple

typicality

instance x is

( |

( {

, σ ) | )

. Then, computing the local simple typicality of all instances in

( ∑ x ∈ O |

O takes O

time. Although the search can be sped up using a

VP-tree, the complexity of the DLTA algorithm is still O

( {

, σ ) | )

( |

)

. The reason is, in the

worst case where

is larger than the diameter (i.e., the largest pairwise distance) of

the data set, the

-neighborhood of each instance contains all other instances in O .

4.2.3 LT3: Local Typicality Approximation Using Tournaments

To reduce the cost of computing local simple typicality further, we incorporate the

tournament mechanism, and propose a local typicality approximation algorithm us-

ing tournaments (the LT3 algorithm). The basic idea is to group the instances locally,

and conduct a tournament in each local group of instances. The instances with the

largest local simple typicality is selected as the winner. The winners are sent to the

next round of tournament. The tournaments terminate when only one instance is left

as the winner. A sampling method is employed to reduce the computational cost.

4.2.3.1 Local Typicality Trees (LT-trees)

A local typicality tree (LT-tree) is an MVP-tree [180] with auxiliary information that

supports local typicality calculation and tournaments. Given an uncertain object O ,

an LT-tree can be constructed as follows.

First, we construct an MVP-tree [180] on O , which is a t -nary VP-tree that uses

more than one vantage point to partition the space. Without loss of generality, let

us assume t

2 l and the data set contains t m instances. We assign a layer number

to each node in the MVP-tree. The root node has layer number 0, and a node is

assigned layer number

if its parent has layer number i . We remove all those

nodes in the MVP-tree whose layer number is not a multiple of t . For a node N

of layer number jt

(

)

(

≥

)

, we connect N to its ancestor in the MVP-tree of layer

(

t .

Second, we compute three pieces of information, the approximate center, the

radius, and the

−

)

-neighborhood, for each node in the LT-tree.

Approximate Center

For a node N in the LT-tree, let O N be the set of instances at N . To compute the

approximate center at a node N in the LT-tree, we draw a sample R of |

in-

stances from O N , and compute the pairwise distance between every two instances in

O N |

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home