Top-k Typicality Queries on Uncertain Data - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

R . Then, for each instance x

R , the center-score of x is the maximum distance from

x to another point in R . The instance in R of the minimum center-score is chosen as

the center. This center approximation procedure is popularly used in computational

geometry. It takes O

∈

( |

O N | )

time for each node N , and O

( |

log t |

| )

time for all

nodes in the LT-tree.

Radius

Once the center c of a node N is chosen, the radius is given by the maximum distance

between c and the other instances at N . This can be computed in time O ( | O N

| )

for

each node N , and O

( |

log t |

| )

for all nodes in the LT-tree.

-neighborhood

We use a range query in the LT-tree to compute a superset of the

-neighborhood of

O N for every node N in the LT-tree, which always achieves a typicality approx-

imation no worse than using the

-neighborhood. To compute the superset, we

start from the root and iteratively search for the nodes that completely lie in the

-neighborhood of N , using the approximate center and radius of N . Once all ob-

jects at a node N are in the

-neighborhood of N ,weuse N to represent them and

do not search any subtrees of N .

4.2.3.2 Query Answering

To answer a top- k simple typicality query, we run tournaments on the LT-tree in a

bottom-up fashion. First, a tournament is run for each leaf node in the LT-tree. The

winner enters the tournament at the parent node of the leaf node. The winner o 1 at

the root node is the approximation of the most typical instance. Figure 4.2 illustrates

the procedure of computing the approximate most typical point in the set of points

in Figure 4.1 using an LT-tree. During the tournaments in the leaf nodes,

{

}

are selected as local winners and sent to the parent nodes. Then,

are selected

in the tournaments in nodes N 2 and N 3 . Finally, e is selected as the winner, which

approximates the most typical point in the data set.

To find the approximation of the second most typical instance, we do not need

to completely run the tournaments again. Instead, we can reuse most of the results

in the tournaments of finding the most typical instance w 1 . The only tournaments

we need to rerun are on the nodes containing w 1 . First, we run a new tournament at

the leaf node N containing w 1 , but do not include w 1 in the new tournament. Then,

the winner w 1 is sent to N p , the parent of N , and a new tournament is run there by

replacing w 1 by w 1 . A series of m tournaments are needed to find a new winner w 2

in the root node, which is the approximation of the second most typical instance. At

each level of the LT-tree, only one node needs to run a tournament. For example, in

{

}

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home