Probabilistic Ranking Queries on Uncertain Data - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

there are. So he asks,“ Who are the top-3 typical guards in whole representing

different types of guards? ”

Simple typicality does not provide the correct answer to this question, since the

3 players with the greatest simple typicality may be quite similar to each other,

while some other popular players different from those three may be missed. Dis-

criminative typicality does not help either, because the exact types of guards and

their members are unknown.

To solve this problem, we develop the notion of representative typicality that mea-

sures how an instance is typical in an uncertain object different from the already

reported typical instances. Given an uncertain object O, a top-k representative

typicality query finds a set of k instances of O with the highest representative

typicality scores.

By default, we consider an uncertain object O on attributes A 1

,···,

A n . Let

A i 1 ,···,

≤

A i l be the attributes on which the typicality queries are applied (1

i j

for 1

≤

l ) and d A i 1 ,···, A i l (

)

be the distance between two instances x and y in S

on attributes A i 1 ,···,

A i l . When A i 1 ,···,

A i l

are clear from context, d A i 1 ,···, A i l (

)

abbreviated to d

We address the top- k typicality problem in a generic metric space. Therefore, the

distance metric d should satisfy the triangle inequality.

(

)

2.2.1.1 Simple Typicality

By intuition and as also suggested by the previous research in psychology and cogni-

tive science (as will be reviewed in Section 3.3.1), an instance o in O is more typical

than the others if o is more likely to appear in O . As discussed in Section 2.1.1, the

set of instances in O on attributes A 1 ,···,

A n can be viewed as a set of independent

and identically distributed samples of an n -dimensional random vector

that takes

values in the Cartesian product space D

D A 1 ×···×

D A n , where D A i is the domain

of attribute A i (1

≤

n ). The likelihood of o

∈

O , given that o is a sample of

can be used to measure the typicality of o .

Definition 2.5 (Simple typicality). Given an uncertain object O on attributes

A 1 ,···,

A n and a subset of attributes A i 1 ,···,

A i l

≤

i j ≤

n for 1

≤

l ) of in-

terest, let

be the n -dimensional random vector generating the instances in O , the

simple typicality of an instance o

∈

O with respect to

on attributes A i 1 ,···,

A i l

is defined as T A i 1 ,···, A i l (

,X )=

L A i 1 ,···, A i l (

|X )

where L A i 1 ,···, A i l (

|X )

is the like-

lihood [27] of o on attributes A i 1 ,···,

A i l , given that o is a sample of

In practice, since the distribution of random vector

is often unknown, we

use T A i 1 ,···, A i l (

L A i 1 ,···, A i l (

)

as an estimator of T A i 1 ,···, A i l (

,X )

, where

L A i 1 ,···, A i l (

)

is the posterior probability of an object o on attributes A i 1 ,···,

A i l

given O [27].

L A i 1 ,···, A i l (

can be computed using density estimation methods. We adopt the

commonly used kernel density estimation method, which does not require any distri-

bution assumption on O . The general idea is to use a kernel function to approximate

)

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home