Database Reference
In-Depth Information
in a database by using a family of hash functions
and least significant bits to create a sketch.
Ganguly et al. (2003) adopt the idea of FM
sketches by introducing a data structure called
2-level hash sketch and proposing algorithms
implementing union, difference and intersection
over the 2-level hash sketches. The 2-level hash
sketch uses two independent families (levels) of
hash functions to determine the bit vector, where
the first family are randomizing hash functions
projecting values uniformly over a range while
the second family distributes the values uniformly
over the binary domain.
index for multiway joins. In the Telegraph project
((Chandrasekaran et al., 2003), http://telegraph.
cs.berkeley.edu/) data/computation sharing is
confined to avoidance of copying base tuples
and maintaining independent indexes, called state
modules, for each base table (Raman et al., 2003;
Madden et al., 2002).
Although view selection techniques are highly
interesting for automatic synopsis selection in
stream processing, there are a few challenges to
be resolved first:
Modeling the interrelationships between
synopses, such as subsumption.
Selection of a cost model for cost estima-
towards Automatic
Synopsis Selection
tion in stream processing.
Choice of the optimization target: through-
Zhang et al. (2005) adopt similar ideas to view
selection to achieve shared computation for mul-
tiple aggregations over data streams. They opt for
per unit time processing costs utilizing shared
synopses called phantoms. Phantoms are hash
tables for aggregations of finer granularity than
the original queries, of which the interrelationship
is modeled in a similar way as a lattice. Under an
assumption of a fixed total memory, maintaining
more phantoms may increase the collision rates,
which in turn increases the costs. They start with
the set of original queries, and select at each step
greedily a most beneficial phantom until the benefit
becomes negative. Based on their collision rate
based cost model, the benefit of incorporating
an additional phantom is derived from a sub-
optimization procedure, which selects the optimal
space allocation to minimize the overall costs for
a given set of phantoms.
Up to now, state-of-the-art data stream systems
are still limited to special cases of computation/
synopsis sharing. In the STREAM prototype
((Arasu et al., 2003a), http://infolab.stanford.
edu/stream/), subplan sharing is limited to exact
subexpression matching, while synopsis shar-
ing remains largely unexplored (Motwani et al.,
2003). Babu et al. (2005) discuss sharing a join
put,
per-unit-time-cost,
or
quality-of-
service.
Integration of synopsis sharing with opera-
tor scheduling.
Adaptability of these methods for continu-
ous data streams.
vIeW MAnAgeMent technIqueS
And dAtA StreAM MAnAgeMent
In this section we explore the relationship of view
maintenance and view selection techniques to
data stream management systems. In particular
continuous query processing is related to view
maintenance whereas view selection techniques
can be applied to continuous query planning.
continuous queries and
view Maintenance
According to (Arasu et al., 2006) materialized
views are a form of continuous queries. Views are
continually updated when the underlying base rela-
tions are updated. Incremental view maintenance
techniques are applied to avoid re-evaluation of
the view definition on complete base relations.
Early implementations of continuous query se-
Search WWH ::




Custom Search