View Management Techniques and Their Application to Data Stream Management - Evolving Application Domains of Data Warehousing and Mining

Database Reference

In-Depth Information

Stream query languages

restrict the discussion to relational data streams.

A stream is defined as a tuple consisting of a

relational tuple and a timestamp.

Definition 1 (Data Stream) A stream S is a

(possibly infinite) bag (multiset) of elements <s;

τ>, where s is a tuple belonging to the schema of

S and τ ∈ T is the timestamp of the element.

There is no bound for the number of tuples

with the same timestamp, except that this number

must be finite. It must not be allowed for que-

ries to perform transformations on timestamps.

Therefore,Arasu et al. (2006) chose to not include

the timestamp into the schema since the abstract

semantics´of their query language relies heavily

on timestamps. However, often it is necessary

to include the timestamp into query predicates.

This can be achieved by mirroring the timestamp

in the schema.

Please note that assignment of timestamps to

data stream tuples is a quite nontrivial problem.

We abstain from dwelling on this topic and just

assume that appropriate timestamps are provided.

Please refer, for instance, to Srivastava and Widom

(2004) for a detailed discussion on time manage-

ment in data streams.

To illustrate the notion of data streams, we will

give examples from the field of road traffic. For

traffic applications, such as traffic state estimation

and forecasting, data describing the flowing traffic

has to be gathered, also called traffic monitoring. In

Germany this is predominantly done by inductive

loops, which are embedded into the road surface

and connected with processing units at the road

side. Other data sources are weather stations or data

from vehicles themselves (also called Floating Car

Data, FCD). The acquired data, for example the

mean speed or traffic volume, can be represented

as data streams flowing from the detectors to a

central control unit. Two data stream schemas

for traffic management could be for example:

MeasurementStation(SectionID; Speed; Volume;

Temperature;Humidity) and FCD(VehicleID;

PosLatitude; PosLongitude; Speed)

A variety of data stream query languages have

been proposed. In the Aurora system (Abadi et

al., 2003) a graphical notation allows the user to

explicitly provide query plans. Query operators

(called boxes in Aurora) are composed to a lattice.

This realization has been chosen since the authors

deem the optimization of the joint query plan for

multiple declarative continuous queries as too

difficult. However, still some optimizations are

performed by Aurora based on runtime statistics

gathered during query execution. The successor

of Aurora, Borealis (Abadi et al., 2005), extends

these concepts to distributed stream processing.

However, most continuous query languages

lend their concepts from relational query languages

and algebra and extend this by new constructs for

handling streaming data. In fact, besides SQL

being known to most developers (Stonebraker et

al., 2005), this approach is advantageous because

it allows to build on the large body of knowledge

acquired about relational databases. Obviously, it

is still necessary to adapt relational operators to the

new requirements for handling data streams.

The problem of streams of data that are too large

for persistent storage and require incremental

evaluation of continuous queries , was first

realized in the context of the Tapestry system

(Terry et al., 1992). In the Tapestry system,

the user specifies a query in a relational query

language (TQL) which is transformed by some

normalization steps into an incremental query .

This incremental query is installed as a stored

procedure in a relational DBMS and periodically

executed to compute (almost) only the new results

for the original query. However, this model blinded

out some of the fundamental problems that are

researched in current DSMS since the query was

not really continuous.

Jagadish et al. (1995) introduced the chronicle

data model to maintain views with aggregation

operators. The chronicle data model is also based

Evolving Application Domains of Data Warehousing and Mining

Search WWH ::

Custom Search

Home