Related Work - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

tion of an uncertain object is unavailable, a set of samples (called instances) are

collected to approximate the probability distribution.

In [41, 42, 43, 44], probabilistic graphical models are used to represent correla-

tions among probabilistic tuples. Moreover, Sen et al. [45, 46] studied the problem

of compact representation of correlations in probabilistic databases by exploiting the

shared correlation structures. Probabilistic graphical models are reviewed in Sec-

tion 3.6.2.

In addition, uncertainty in data integration is studied in [9, 2], where probabilis-

tic schema mapping is modeled as a distribution over a set of possible mappings

between two schemas.

On the system level, Orion [47, 48] is an uncertain database management system

that supports the attribute and tuple level uncertainty with arbitrary correlations.

Three basic operations are defined to perform selection and to compute marginal

distributions and joint distributions. Other relational operations can be defined based

on the three basic operations. Both continuous and discrete uncertainty is handled

in Orion [47, 48].

3.1.2 Probabilistic Queries on Uncertain Data

Cheng et al. [18] provided a general classification of probabilistic queries and eval-

uation algorithms over uncertain data sets. Different from the query answering in

traditional data sets, a probabilistic quality estimate was proposed to evaluate the

quality of results in probabilistic query answering. Dalvi and Suciu [11] proposed

an efficient algorithm to evaluate arbitrary SQL queries on probabilistic databases

and rank the results by their probability. Later, they showed in [49] that the complex-

ity of evaluating conjunctive queries on a probabilistic database is either PTIME or

# P -complete.

Chen et al. [50] studied aggregate queries on data whose attributes may take

“partial values”, where a “partial value” is a set of possible values with only one

being true. The answer to an aggregate query involving such attributes is also in the

form of a “partial value”. In [51], the efficient evaluation of sum , count , min and

max queries on probabilistic data is studied based on Monte Carlo simulation. Only

the top- k answers to the query with the highest probabilities are returned. Burdick et

al. [52, 53, 54] extended the OLAP model on imprecise data. Different semantics

of aggregation queries are considered over such data. Following with the possible

worlds semantics model [23, 24, 7, 12], the answer to an aggregation query is rep-

resented as an answer random variable with certain probability distribution over a

set of possible values. In [55], several one pass streaming algorithms are proposed

to estimate statistical aggregates of a probabilistic data stream, which contains a

(potentially infinite) sequence of uncertain objects.

Cheng et al. [56] explored the join queries over data sets with attribute level

uncertainty, where the values of a tuple in the join attributes are probability dis-

tributions in a set of value intervals. In [57], join queries are studied on data sets

Search WWH ::

Custom Search

Home