Conclusions - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

Computing the exact answers to top- k typicality queries is quadratic in nature.

We develop three efficient approximation algorithms, a randomized tournament

method, a local typicality approximation algorithm that approximates the typi-

cality score of an instance o using a small subset of instances close to o , and a

hybrid method that combines the tournament mechanism and the local typicality

approximation.

•

We develop probabilistic ranking queries and reverse probabilistic ranking

queries on uncertain data by considering the two dimensions: the ranks of un-

certain instances or objects and their probabilities of achieving certain ranks.

Given a rank parameter k , we can rank instances or objects according to their

top- k probabilities (that is, the probabilities of being ranked top- k ). A prob-

abilistic ranking query finds the uncertain instances or objects whose top- k

probabilities are the largest.

Given a probability threshold p , the p -rank of an instance o is the minimum k

such that Pr k

(

) ≥

p . Similarly, the p -rank of an object O is the minimum k

such that Pr k

p .A reverse probabilistic ranking query finds the uncer-

tain instances or objects whose p -ranks are the smallest.

(

) ≥

Probabilistic ranking queries and reverse probabilistic ranking queries provide

complement views of the ranking characterizes of uncertain data. Moreover, we

develop three efficient query evaluation methods, an exact algorithm, a sampling

method, and a Poisson approximation based method. Last, we devise an effec-

tive and compact index for probabilistic ranking queries and reverse probabilistic

ranking queries, which helps efficient online query answering.

•

We study the problem of continuous probabilistic top- k query (continuous PT- k

query for short) on uncertain streams, which reports, for each time instant t , the

set of uncertain streams satisfying the query in the sliding window. This problem

extends the ranking queries on static data to dynamic data. To tackle this prob-

lem, we develop an exact algorithm that involves several important stream spe-

cific techniques and a sampling method. Moreover, we devise the space efficient

versions of the exact algorithm and the sampling method that utilize approximate

quantiles to maintain the summary of the data streams.

•

We study ranking queries on the probabilistic linkage model, which is an exten-

sion of the basic uncertain object model by considering the dependencies among

objects. Given two sets of tuples A and B , a linkage matches a tuple in A to a tuple

in B , meaning that the two tuples refer to the same real world entity. A probability

is associated with each linkage, representing the confidence of the match. Each

tuple can be associated with multiple linkages, but only one linkage can be true

due to the constraint that one tuple can only match at most one tuple in another

data set in the ground truth.

We can consider each tuple t A ∈

A as an uncertain object. A tuple t B ∈

B can be

considered as an instance of t A if there is a probabilistic linkage l

t A ,

t B ) ∈L

such that Pr

(

) >

0. The membership probability of instance t B with respect to

object t A is Pr

. Different from the basic uncertain object model where each

instance only belongs to one object, in the probabilistic linkage model, a tuple

(

)

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home