Database Reference
In-Depth Information
1. OVERVIEW
Scientific data management is a major application domain for probabilistic databases. One of the
early works recognizing this potential is by Nierman and Jagadish [ 2002 ]. They describe a system,
ProTDB (Probabilistic Tree Data Base) based on a probabilistic XML data model and they apply it to
protein chemistry data from the bioinformatics domain. Detwiler et al. [ 2009 ] describe BioRank, a
mediator-based data integration systems for exploratory queries that keeps track of the uncertainties
introduced by joining data elements across sources and the inherent uncertainty in scientific data.
The system uses the uncertainty for ranking uncertain query results, in particular for predicting
protein functions. They use the uncertainty in scientific data integration for ranking uncertain query
results, and they apply this to protein function prediction. They show that the use of probabilities
increases the system's ability to predict less-known or previously unknown functions but is not more
effective for predicting well-known functions than deterministic methods. Potamias et al. [ 2010 ]
describe an application of probabilistic databases for the study of protein-protein interaction. They
consider the protein-protein interaction network (PPI) created by Krogan et al. [ 2006 ] where two
proteins are linked if it is likely that they interact and model it as a probabilistic graph. Another
application of probabilistic graph databases to protein prediction is described by Zouetal. [ 2010 ].
Voronoi diagrams on uncertain data are considered by Cheng et al. [ 2010b ].
Dong et al. [ 2009 ] consider uncertainty in data integration ; they introduce the concept of
probabilistic schema mappings and analyze their formal foundations. They consider two possible
semantics, by-table and by-tuple. Gal et al. [ 2009 ] study how to answer aggregate queries with
COUNT, AVG, SUM, MIN, and MAX over such mappings, by considering both by-table and by-
tuple semantics. Cheng et al. [ 2010a ] study the problem of managing possible mappings between
two heterogeneous XML schemas, and they propose a data structure for representing these mappings
that takes advantage of their high degree of overlap. van Keulen and de Keijzer [ 2009 ] consider user
feedback in probabilistic data integration. Fagin et al. [ 2010 ] consider probabilistic data exchange
and establish a foundational framework for this problem.
Several researchers have recognized the need to redesign major components of data
management systems in order to cope with uncertain data. Cormode et al. [ 2009a ] and
Cormode and Garofalakis [ 2009 ] redesign the histogram synopses, both for internal DBMS deci-
sions (such as indexing and query planning) and for approximate query processing. Their histograms
retain the possible-worlds semantics of probabilistic data, allowing for more accurate, yet concise, rep-
resentation of the uncertainty characteristics of data and query results. Zhang et al. [ 2008 ] describe
a data mining algorithm on probabilistic data. They consider a collection of X-tuples and search for
approximately likely frequent items, with guaranteed high probability and accuracy. Rastogi et al.
[ 2008 ] describe how to redesign access control to data when the database is probabilistic. They
observe that access is often controlled by data, for example, a physician may access a patient's data
only if the database has a record that the physician treats that patient; but in probabilistic databases
the grant/deny decision is uncertain. The authors described a new access control method that adds a
degree of noise to the data that is proportional to the degree of uncertainty of the access condition.
Atallah and Qi [ 2009 ] describe how to extend skyline computation to probabilistic databases, with-
Search WWH ::




Custom Search