Database Reference
In-Depth Information
query usually must be deregistered after some
time. Finally, both bounds can be fixed, yielding
so called fixed-bound windows, which enable
selection of records in a certain interval of time
in the stream (if available).
Especially in data warehousing it may be inter-
esting to aggregate values for different granulari-
ties of time. At each aggregation level along the
time dimension a value has to be calculated based
on the records seen so far. Therefore, a landmark
window can be used, which aggregates the data
streaming in to an aggregation level, such as one
hour. This approach guarantees that there always
exists a preliminary value for the current hour.Af-
ter one hour, the lower bound of the window may
be shifted. This idea is called tilted time frame (Han
et al., 2007). Three different scaling options are
described. In the natural tilted time frame model,
the time scale is natural, which means that time
portions are mapped as they occur (15 minutes
represented by 15 slots, 4 quarters by 4 slots and
so on). Other models are the logarithmic tilted time
frame model using a logarithmic time scale and
progressive logarithmic tilted time frame model
which stores snapshots of the data in logarithmic
scale according to their actuality.
approximate join estimation), computation of ag-
gregate statistics and data mining (Aggarwal and
Yu, 2007). This section will give a brief overview
of synopses. A detailed discussion is beyond the
scope of this chapter and the interested reader is
referred to more detailed literature (Babcock et
al., 2002; Aggarwal and Yu, 2007).
There exist different synopsis techniques in-
troduced for traditional databases and time series
analysis. An easy approach to build a synopsis
of a stream is sampling. For example, random
sampling selects randomly tuples from the data
stream and stores them as a summary. However,
it depends on the variety of the stream data if the
selected samples constitute an adequate repre-
sentation. Sampling poses the advantage that the
multi-dimensional data representation of the data
stream is kept and is therefore well suited for data
mining applications (Aggarwal, 2006). Reservoir
sampling techniques are a probabilistic method
which collects tuples in a reservoir either without
restriction (unbiased) or the size of the reservoir is
restricted by an unbiased or biased function, which
determines, if a tuple is added to the reservoir or
not (Aggarwal, 2006).
Sliding windows as already introduced in Sec-
tion 5.2 can be seen as a special case of reservoir
sampling, which are temporally biased as they only
take into account recent tuples. Concise sampling
improves reservoir sampling by summarizing the
distinct values of a stream which are assumed to
fit into memory (Aggarwal and Yu, 2007).
Histograms have already successfully been
applied for approximate query answering in tra-
ditional database systems (Poosola et al., 1999).
Histograms partition data into buckets according
to a partitioning rule, where each bucket represents
a set of attribute values. The attribute values are
represented on the x-axis of the histogram and
their quantity or frequency on the y-axis. If a user
query, such as a range query is issued, it can be
easily answered approximately using histograms.
There are different types of histograms, such as
equi-width, equi-depth, or v-optimal histograms
Synopsis construction
As we have seen in the previous sections, a data
stream is an unbounded sequence of tuples, which
is not feasible or desirable to be stored and que-
ried in its entirety. Especially, space complexity
is an important issue in data stream processing.
Hence, DSMS aim at processing each tuple in a
one-pass manner.
Besides window models, another option for
approximating data streams for queries are syn-
opses. Synopses provide a summarization of the
data stream using specific algorithms and can be
stored using the available resources. The stored
summary can then be used to answer user queries
with a satisfactory accuracy. Other application
fields of synopses are query optimization (using
Search WWH ::




Custom Search