Database Reference
In-Depth Information
tion of an uncertain object is unavailable, a set of samples (called instances) are
collected to approximate the probability distribution.
In [41, 42, 43, 44], probabilistic graphical models are used to represent correla-
tions among probabilistic tuples. Moreover, Sen et al. [45, 46] studied the problem
of compact representation of correlations in probabilistic databases by exploiting the
shared correlation structures. Probabilistic graphical models are reviewed in Sec-
tion 3.6.2.
In addition, uncertainty in data integration is studied in [9, 2], where probabilis-
tic schema mapping is modeled as a distribution over a set of possible mappings
between two schemas.
On the system level, Orion [47, 48] is an uncertain database management system
that supports the attribute and tuple level uncertainty with arbitrary correlations.
Three basic operations are defined to perform selection and to compute marginal
distributions and joint distributions. Other relational operations can be defined based
on the three basic operations. Both continuous and discrete uncertainty is handled
in Orion [47, 48].
3.1.2 Probabilistic Queries on Uncertain Data
Cheng et al. [18] provided a general classification of probabilistic queries and eval-
uation algorithms over uncertain data sets. Different from the query answering in
traditional data sets, a probabilistic quality estimate was proposed to evaluate the
quality of results in probabilistic query answering. Dalvi and Suciu [11] proposed
an efficient algorithm to evaluate arbitrary SQL queries on probabilistic databases
and rank the results by their probability. Later, they showed in [49] that the complex-
ity of evaluating conjunctive queries on a probabilistic database is either PTIME or
# P -complete.
Chen et al. [50] studied aggregate queries on data whose attributes may take
“partial values”, where a “partial value” is a set of possible values with only one
being true. The answer to an aggregate query involving such attributes is also in the
form of a “partial value”. In [51], the efficient evaluation of sum , count , min and
max queries on probabilistic data is studied based on Monte Carlo simulation. Only
the top- k answers to the query with the highest probabilities are returned. Burdick et
al. [52, 53, 54] extended the OLAP model on imprecise data. Different semantics
of aggregation queries are considered over such data. Following with the possible
worlds semantics model [23, 24, 7, 12], the answer to an aggregation query is rep-
resented as an answer random variable with certain probability distribution over a
set of possible values. In [55], several one pass streaming algorithms are proposed
to estimate statistical aggregates of a probabilistic data stream, which contains a
(potentially infinite) sequence of uncertain objects.
Cheng et al. [56] explored the join queries over data sets with attribute level
uncertainty, where the values of a tuple in the join attributes are probability dis-
tributions in a set of value intervals. In [57], join queries are studied on data sets
Search WWH ::




Custom Search