Database Reference
In-Depth Information
Even with these techniques, query evaluation remains a major challenge. It can be observed
that a key challenge to efficiency is avoiding the selectivity trap : database queries are often highly
selective, through selections and joins, and many of the tuples of the input database(s) do not
survive the path through the query to the result. This has been already observed in early work on
online aggregation [ Hellerstein et al. , 1997 ]. The same issue applies to MCDBs, and a promising
solution is to postpone sampling as long as possible during query evaluation, working with summary
representations of all possible worlds until operators such as tuple probability computations force
the system to produce samples to proceed. The PIP system [ Kennedy and Koch , 2010 ] does just
this using pc-table representations of probabilistic databases. As shown there, pc-tables generalize
tuple bundles in their power to compactly represent many samples (in fact, all possible worlds).
Since, as discussed earlier, pc-tables are a strong representation system for relational algebra, the
relational algebra part of a query can be evaluated by transforming the c-table representation, without
touching the representation of the probability distribution modeled by the random variables in the
tuple conditions. Samples are only generated after the evaluation of the relational algebra operations
of the query is finished, and the costly and unnecessary sampling of data that would fall victim to
filtering in the relational algebra can be suitably counteracted while preserving the correctness of the
overall query result.
6.4 INDEXES AND MATERIALIZED VIEWS
In relational databases, indexes and materialized views are two powerful and popular techniques
to improve query performance. Achieving high query performance is a major technical challenge
for probabilistic databases, and researchers have naturally adapted indexing and materialized view
techniques to probabilistic databases. Probabilistic data, however, presents new challenges that are
not found in relational database management systems. A first conceptual reason that probabilistic
indexing differs from relational indexing is that probabilistic databases allow new types of queries.
For example, consider an environmental monitoring application where we are measuring the tem-
perature of a physical space. The sensors can report the temperature in the room only to within some
confidence. In this setting, one may ask “return the ids of all sensors whose temperature is in some critical
range with probability greater than 0 . 6 . To answer this query, one alternative is to probe each of the
sensors and ask them to report their temperature reading. Alternatively, one could use an index and
quickly identify those sensors that meet the above criteria.
Indexing and materialized view techniques also need to be rethought for technical reasons. For
example, in an RDBMS, each tuple can be processed independently. In turn, the RDBMS leverages
this freedom to layout the data to make query retrieval efficient. In contrast, tuples in a probabilistic
databases may be correlated in non-obvious ways.
6.4.1 INDEXES FOR PROBABILISTIC DATA
The first set of indexing for probabilistic databases were concerned with continuous probabilistic
databases to support queries called probabilistic threshold queries (PTQ) [ Cheng et al. , 2004 , Qi et al. ,
Search WWH ::




Custom Search