Top-k Typicality Queries on Uncertain Data - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

typicality score. After the first answer instance is added into A , to find the approxi-

mation of the next most representatively typical instance, a tournament is conducted

from bottom up. In each node of the LT-tree, we compute the local representative

typicality of each instance, and select the instance with the greatest local representa-

tive typicality score as the winner, and let it go to the tournament in the parent node.

The computation of local representative typicality is similar to the local representa-

tive typicality computation in the DLTA algorithm.

There is one critical difference between the LT3 algorithm for simple typicality

computation and the LT3 algorithm for representative typicality computation. In

simple typicality computation, once the first winner instance is computed, we only

need to re-conduct part of the tournament to find the next winner instance. However,

the representative typicality score of each instance changes once the reported answer

set is updated. Thus, no results can be reused. A new tournament among the rest

of the instances should be conducted from bottom up completely. Therefore, the

LT3 method for representative typicality computation is not as efficient as the LT3

method for simple typicality or discriminative typicality computation.

Similar to Theorem 4.3, the quality of answering top- k representative typicality

queries using the LT3 method is guaranteed.

Theorem 4.8. In an uncertain object O, let o be the instance of the largest repre-

sentative typicality and

o be an instance computed by the LT3 method using local

representative typicality approximation and the tournament group size t. Then,

2

h √ 2

e − σ 2

RT

(

o

,

A

,

O

) −

RT

(

o

,

A

,

O

) <

·

log t |

S

|

2 h 2

π

Proof. As indicated by Inequality 4.13 in Theorem 4.7, at each level of tournament,

an error up to

e − σ 2

2

h √ 2

2 h 2 in terms of the difference of representative typicality is

introduced. For an uncertain object O , there are

π

log t |

O

|

levels of tournaments.

4.5 Empirical Evaluation

In this section, we report a systematic empirical study using real data sets and syn-

thetic data sets. All experiments were conducted on a PC computer with a 3.0 GHz

Pentium 4 CPU, 1

0 GB main memory, and a 160 GB hard disk, running the Mi-

crosoft Windows XP Professional Edition operating system. Our algorithms were

implemented in Microsoft Visual C++ V6.0.

.

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home