Top-k Typicality Queries on Uncertain Data - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

4.4 Answering Representative Typicality Queries

The representative typicality for an instance o in an uncertain object O with respect

to an reported answer set A was defined in Definition 2.9. In this section, we first

propose a straightforward approach to find the exact answer of a top- k representative

typicality query. Then, we will discuss how to extend the approximation techniques

proposed for simple typicality queries to efficiently answer top- k representative typ-

icality queries.

4.4.1 An Exact Algorithm and Complexity

When the answer set A is empty, the most representatively typical instance is simply

the most typical instance o 1 , which can be computed using Algorithm 4.1. After o 1

is added to A , the group typicality GT

(

)

is the simple typicality score T

(

o 1 ,

)

since all members in uncertain object O are represented by o 1 .

Then, in order to compute the next instance with the maximal representative

typicality score, according to Definition 2.9, we have to compute the group typicality

score G

(

∪{

)

for each instance o

∈ (

−

)

and select the instance with the

maximal score.

To compute GT

(

∪{

)

, we first construct N

(

)

for instance o as fol-

lows. We scan all instances in

(

−

)

. For each instance x

∈ (

−

)

, suppose

o ,

for an instance o ∈

A , which means that o is the instance closest

∈

(

)

o ,

to x in A .If d

(

) <

(

)

, then x is removed from N

(

)

and is added to

. To make the computation efficient, we maintain the minimum distance

from an instance x

(

)

to the instances in A by a 1-dimensional array. The

minimum distances are updated every time after a new object is added into A .

Then, T

∈ (

−

)

(

))

(

)

, the simple typicality of o in N

, is computed using

is | N ( o , A , O ) |

| O |

. For other instances o ∈

(

))

Algorithm 4.1. Probability Pr

A ,

o ,

since N

(

)

may be changed, the simple typicality scores T

(

))

and

o ,

))

(

∪{

)

(

are updated accordingly. Last, GT

can be calculated

according to Definition 2.8.

The above procedure is repeated to find the next most representatively typical

instance, until k instances are found.

The complexity of the exact algorithm is O

kn 2

because each time after an in-

stance is added to A , the representative typicality scores of all instances in

(

)

need to be recomputed to find the next instance with the largest representative typi-

cality score.

(

−

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home