Database Reference
In-Depth Information
(Poosola et al., 1999; Babcock et al., 2002). Each
type has a different approach to construct the
buckets which influences obviously the accuracy
of the histogram.
Further techniques building synopses for data
streams are based on wavelets. Wavelets have
been used in a variety of fields, especially in
signal processing and data and image compres-
sion. In the database area they have been used
for selectivity estimation (with wavelet-based
histograms), query approximation (e.g., data cube
approximation) and clustering (Keim and Heczko,
2001). The general process of using wavelets for
these applications is the decomposition of the in-
put data by means of the wavelet transformation,
processing the approximated data in wavelet space,
i.e., in its decomposed representation, and finally
retransforming the results by wavelet synthesis.
The decomposition is done in several levels,
where on each level the data is more and more
summarized. In the wavelet space the coefficients
of the higher levels can give information about
trends in the overall data while coefficients of the
lower level represent local trends (Aggarwal and
Yu, 2007). Applied to data streams, the wavelet-
based approaches have also to obey space and
memory limitations and have to cope with the
one-pass characteristics and continuous updates
of data streams. Gilbert et al. (2003) present an
approach to create wavelet-based synopses for
data streams, where the data stream represents
the input signal for the wavelet transformation.As
data streams, they update the synopsis of the data
read so far striving at the calculation of the highest
B-term approximation. For this approximation
only a small set of wavelet base coefficients, the
B- term coefficients, is used which provide the
highest 'energy', i.e., which best describe the
stream data. The reconstructed stream based on the
B-coefficients is then called the best B-term ap-
proximation, which poses a minimal sum squared
error compared to the original data.
Sketches are also a common synopsis technique
in data streams. Essentially, these methods have
been borrowed from time series analysis and are
based on the principles of random projection.
Random projection is used to map vectors (data
points) from a high-dimensional space to a lower
dimension. A well known sketching technique in
data streams are Alon-Matias-Szegedy (AMS)
sketches introduced by Alon et al. (1999). AMS
sketches are based on the notion of so called
frequency moments. For example let A = (a 1 ; a 2 ;
…; a m ) be a sequence of elements and N = {1; 2;
…; n} be the domain of the a i . Then the frequency
moments are defined by
å 1
k
F
=
nm
k
i
i
=
where m i is the number of occurrences of an
element in a data stream and n the dimension of
the value domain. F 0 is then the number of distinct
elements in the data stream, F 1 the entire number
of elements and F 2 the repeat rate (or Gini's index
of homogeneity) and F the most frequent item's
multiplicity. The frequency moments measure has
been used by Alon et al. (1999) to create a random-
ized join-size estimator for data streams, where the
values of the data streams are projected on a family
of random variables to estimate the join size fol-
lowing the idea of random linear projection.
TheAMS sketches technique has been used for
data streams in several approaches. For example,
Cormode and Garofalakis (2007) enhance the
AMS sketches technique to approximate complex
aggregate queries on probabilistic data streams
(so called probabilistic AMS sketches), i.e., to
each tuple an independent existence probability
is assigned according to a certain probability dis-
tribution. Rao and Moon (2006) extend the idea
of AMS sketches to build one-pass sketches for
streams of labeled trees (e.g., XML documents).
A very efficient sketch technique addressing the
counting problem (i.e., counting the distinct values
in a data set) are Flajolet-Martin (FM) sketches
(Flajolet and Martin, 1985). The authors propose a
one-pass counting approach for the distinct values
Search WWH ::




Custom Search