Top-k Typicality Queries on Uncertain Data - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

4.3.2.1 DLTA: Direct Local Typicality Approximation

Theorem 4.5 can be directly used to approximate discriminative typicality. Given a

neighborhood threshold

, for each instance x

∈

O , we compute the

-neighborhood

{

}

in O and S , respectively, and thus its local discriminative typicality. Searching

the

-neighborhood can also be done using a VP-tree, as described in Section 4.2.2.

The cost of computing the local discriminative typicality of an instance x

∈

( |

( {

, σ ) | )

is O

. The overall cost of computing the local discriminative

typicality of all instances in the target object O is O

( ∑ x ∈ O |

( {

, σ ) | )

.As

analyzed in Section 4.2.2, the

-neighborhood of an instance may contain the whole

data set in the worst case. Thus, the time complexity of the direct local typicality

approximation method for discriminative typicality is O

, where m is the

cardinality of the target object O , and n is the number of instances in S . However,

data is often distributed as clusters in practice, and the number of instances con-

tained in the

(

))

-neighborhood of each instance is often small.

4.3.2.2 LT3: Local Typicality Approximation Using Tournaments

The local typicality approximation method using tournaments (LT3) method can be

extended to answer top- k discriminative typicality queries, which follows the same

framework as answering top- k simple typicality queries. The only difference is that,

in the tournament in each node, local discriminative typicality is computed, instead

of local simple typicality.

Similar to Theorem 4.3, we have the following guarantee of the quality of an-

swering top- k discriminative typicality queries using the LT3 method.

Theorem 4.6. Given two uncertain objects O and S, let o be the instance in O of the

largest discriminative typicality and

o be an instance computed by the LT3 method

using local discriminative typicality approximation with the tournament group size

t. Then,

e − σ 2

h √ 2

(

) −

(

) <

log t |

2 h 2

Proof. As indicated by Inequality 4.8 in Theorem 4.5, at each level of the tourna-

e − σ 2

h √ 2

ment, an error up to

in terms of discriminative typicality is introduced.

2 h 2

For objects O and S , there are

log t |

levels of tournaments. Thus, we have

the inequality in the theorem.

Sampling method introduced in Section 4.2.3.3 can also be used to bound the

runtime of the LT3 algorithm for discriminative typicality approximation.

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home