Database Reference
In-Depth Information
Box 1. A time-based window
Box 2. A window using partitioning
SELECT SectionID, AVG(Speed)
FROM MeasurementStation (Range 60 seconds] AS m
GROUP BY SectionID
SELECT SectionID, LaneNo, AVG(Speed)
FROM
MeasurementStation (PARTITION BY LaneNo RANGE 2
minutes]
GROUP BY SectionID
Therefore, a time-based window can be defined
which only selects measured data from the last 60
seconds. A CQL query with a time-based window
is shown in Box 1.
To always gather the most recent results, the
window “moves” along the stream. This idea is
also called sliding window, because the window
is sliding with the data stream flow. If we are
interested in the stream records of the last five
minutes every minute, the single windows overlap.
This idea of defining the granularity of sliding
is defined in CQL using a slide parameter. If we
are interested in a window representing records
from the last five minutes every five minutes the
windows are disjoint. This is also called tumbling
windows and has been used for instance in the
Aurora system (Carney et al., 2002). Carney et al.
(2002) extend the idea of tumbling windows by a
latch window operator, which is also capable of
storing intermediate results between two succes-
sive windows. Defining a coarse granularity for
window sliding may improve overall performance
of the DSMS running the query. However, the
granularity heavily depends on the application's
requirements.
Another technique to define a validity cri-
terion for data stream elements are tuple-based
windows (Arasu et al., 2003b, 2006). In tuple-
based window models the number of the last N
tuples which are valid for the current window is
defined. This idea is further specialized by de-
fining partitioned windows (Arasu et al., 2003b,
2006). A partitioned window definition takes a
set of attributes from the stream's schema and a
window size N as input. The stream is, similar to
a grouping operation, partitioned into substreams
by the attribute values. The window contains the
last N records of each group.
For our traffic application we now assume, that
for each lane of the highway the traffic is monitored
separately. Hence, we introduce a new attribute
LaneNo in the MeasurementStation data stream
schema. The query in Box 2 analyzes speed and
traffic volume for each lane separately and creates
a substream for each by using the PARTITION
BY operator from CQL.
Li et al. (2005) define a sliding window by using
parameters for the size of the window (RANGE),
the sliding step (SLIDE) and the attribute over
which range and size are defined (WATTR).
Furthermore, windowing techniques can be
distinguished according to the definition of their
bounds (also called edges (Patroumpas and Sellis,
2006)). Suppose we want to track velocities on a
freeway section after a specific event happened,
for example, an accident. We want to analyze
the change in average velocity since the moment
that the accident happened, and want to monitor
it until the situation has been solved. In this case
we would fix the lower bound of the window, such
that it marks the time of the accident in the stream.
The upper bound of the window is progressing
with new data records streaming in. These kinds
of windows are called landmark windows . Spe-
cifically, landmark windows are categorized in
upper-bound and lower-bound windows, depend-
ing on the edge which is fixed (Patroumpas and
Sellis, 2006). Of course, a landmark window with
a lower bound suffers from the same problems
like any data stream. Due to the ever-growing size
of the window, the amount of resources required
for query evaluation also grows. Hence, such a
 
Search WWH ::




Custom Search