View Management Techniques and Their Application to Data Stream Management - Evolving Application Domains of Data Warehousing and Mining

Database Reference

In-Depth Information

query usually must be deregistered after some

time. Finally, both bounds can be fixed, yielding

so called fixed-bound windows, which enable

selection of records in a certain interval of time

in the stream (if available).

Especially in data warehousing it may be inter-

esting to aggregate values for different granulari-

ties of time. At each aggregation level along the

time dimension a value has to be calculated based

on the records seen so far. Therefore, a landmark

window can be used, which aggregates the data

streaming in to an aggregation level, such as one

hour. This approach guarantees that there always

exists a preliminary value for the current hour.Af-

ter one hour, the lower bound of the window may

be shifted. This idea is called tilted time frame (Han

et al., 2007). Three different scaling options are

described. In the natural tilted time frame model,

the time scale is natural, which means that time

portions are mapped as they occur (15 minutes

represented by 15 slots, 4 quarters by 4 slots and

so on). Other models are the logarithmic tilted time

frame model using a logarithmic time scale and

progressive logarithmic tilted time frame model

which stores snapshots of the data in logarithmic

scale according to their actuality.

approximate join estimation), computation of ag-

gregate statistics and data mining (Aggarwal and

Yu, 2007). This section will give a brief overview

of synopses. A detailed discussion is beyond the

scope of this chapter and the interested reader is

referred to more detailed literature (Babcock et

al., 2002; Aggarwal and Yu, 2007).

There exist different synopsis techniques in-

troduced for traditional databases and time series

analysis. An easy approach to build a synopsis

of a stream is sampling. For example, random

sampling selects randomly tuples from the data

stream and stores them as a summary. However,

it depends on the variety of the stream data if the

selected samples constitute an adequate repre-

sentation. Sampling poses the advantage that the

multi-dimensional data representation of the data

stream is kept and is therefore well suited for data

mining applications (Aggarwal, 2006). Reservoir

sampling techniques are a probabilistic method

which collects tuples in a reservoir either without

restriction (unbiased) or the size of the reservoir is

restricted by an unbiased or biased function, which

determines, if a tuple is added to the reservoir or

not (Aggarwal, 2006).

Sliding windows as already introduced in Sec-

tion 5.2 can be seen as a special case of reservoir

sampling, which are temporally biased as they only

take into account recent tuples. Concise sampling

improves reservoir sampling by summarizing the

distinct values of a stream which are assumed to

fit into memory (Aggarwal and Yu, 2007).

Histograms have already successfully been

applied for approximate query answering in tra-

ditional database systems (Poosola et al., 1999).

Histograms partition data into buckets according

to a partitioning rule, where each bucket represents

a set of attribute values. The attribute values are

represented on the x-axis of the histogram and

their quantity or frequency on the y-axis. If a user

query, such as a range query is issued, it can be

easily answered approximately using histograms.

There are different types of histograms, such as

equi-width, equi-depth, or v-optimal histograms

Synopsis construction

As we have seen in the previous sections, a data

stream is an unbounded sequence of tuples, which

is not feasible or desirable to be stored and que-

ried in its entirety. Especially, space complexity

is an important issue in data stream processing.

Hence, DSMS aim at processing each tuple in a

one-pass manner.

Besides window models, another option for

approximating data streams for queries are syn-

opses. Synopses provide a summarization of the

data stream using specific algorithms and can be

stored using the available resources. The stored

summary can then be used to answer user queries

with a satisfactory accuracy. Other application

fields of synopses are query optimization (using

Evolving Application Domains of Data Warehousing and Mining

Search WWH ::

Custom Search

Home