Database Reference
In-Depth Information
Stream query languages
restrict the discussion to relational data streams.
A stream is defined as a tuple consisting of a
relational tuple and a timestamp.
Definition 1 (Data Stream) A stream S is a
(possibly infinite) bag (multiset) of elements <s;
τ>, where s is a tuple belonging to the schema of
S and τ T is the timestamp of the element.
There is no bound for the number of tuples
with the same timestamp, except that this number
must be finite. It must not be allowed for que-
ries to perform transformations on timestamps.
Therefore,Arasu et al. (2006) chose to not include
the timestamp into the schema since the abstract
semantics´of their query language relies heavily
on timestamps. However, often it is necessary
to include the timestamp into query predicates.
This can be achieved by mirroring the timestamp
in the schema.
Please note that assignment of timestamps to
data stream tuples is a quite nontrivial problem.
We abstain from dwelling on this topic and just
assume that appropriate timestamps are provided.
Please refer, for instance, to Srivastava and Widom
(2004) for a detailed discussion on time manage-
ment in data streams.
To illustrate the notion of data streams, we will
give examples from the field of road traffic. For
traffic applications, such as traffic state estimation
and forecasting, data describing the flowing traffic
has to be gathered, also called traffic monitoring. In
Germany this is predominantly done by inductive
loops, which are embedded into the road surface
and connected with processing units at the road
side. Other data sources are weather stations or data
from vehicles themselves (also called Floating Car
Data, FCD). The acquired data, for example the
mean speed or traffic volume, can be represented
as data streams flowing from the detectors to a
central control unit. Two data stream schemas
for traffic management could be for example:
MeasurementStation(SectionID; Speed; Volume;
Temperature;Humidity) and FCD(VehicleID;
PosLatitude; PosLongitude; Speed)
A variety of data stream query languages have
been proposed. In the Aurora system (Abadi et
al., 2003) a graphical notation allows the user to
explicitly provide query plans. Query operators
(called boxes in Aurora) are composed to a lattice.
This realization has been chosen since the authors
deem the optimization of the joint query plan for
multiple declarative continuous queries as too
difficult. However, still some optimizations are
performed by Aurora based on runtime statistics
gathered during query execution. The successor
of Aurora, Borealis (Abadi et al., 2005), extends
these concepts to distributed stream processing.
However, most continuous query languages
lend their concepts from relational query languages
and algebra and extend this by new constructs for
handling streaming data. In fact, besides SQL
being known to most developers (Stonebraker et
al., 2005), this approach is advantageous because
it allows to build on the large body of knowledge
acquired about relational databases. Obviously, it
is still necessary to adapt relational operators to the
new requirements for handling data streams.
The problem of streams of data that are too large
for persistent storage and require incremental
evaluation of continuous queries , was first
realized in the context of the Tapestry system
(Terry et al., 1992). In the Tapestry system,
the user specifies a query in a relational query
language (TQL) which is transformed by some
normalization steps into an incremental query .
This incremental query is installed as a stored
procedure in a relational DBMS and periodically
executed to compute (almost) only the new results
for the original query. However, this model blinded
out some of the fundamental problems that are
researched in current DSMS since the query was
not really continuous.
Jagadish et al. (1995) introduced the chronicle
data model to maintain views with aggregation
operators. The chronicle data model is also based
Search WWH ::




Custom Search