View Management Techniques and Their Application to Data Stream Management - Evolving Application Domains of Data Warehousing and Mining

Database Reference

In-Depth Information

Box 1. A time-based window

Box 2. A window using partitioning

SELECT SectionID, AVG(Speed)

FROM MeasurementStation (Range 60 seconds] AS m

GROUP BY SectionID

SELECT SectionID, LaneNo, AVG(Speed)

FROM

MeasurementStation (PARTITION BY LaneNo RANGE 2

minutes]

GROUP BY SectionID

Therefore, a time-based window can be defined

which only selects measured data from the last 60

seconds. A CQL query with a time-based window

is shown in Box 1.

To always gather the most recent results, the

window “moves” along the stream. This idea is

also called sliding window, because the window

is sliding with the data stream flow. If we are

interested in the stream records of the last five

minutes every minute, the single windows overlap.

This idea of defining the granularity of sliding

is defined in CQL using a slide parameter. If we

are interested in a window representing records

from the last five minutes every five minutes the

windows are disjoint. This is also called tumbling

windows and has been used for instance in the

Aurora system (Carney et al., 2002). Carney et al.

(2002) extend the idea of tumbling windows by a

latch window operator, which is also capable of

storing intermediate results between two succes-

sive windows. Defining a coarse granularity for

window sliding may improve overall performance

of the DSMS running the query. However, the

granularity heavily depends on the application's

requirements.

Another technique to define a validity cri-

terion for data stream elements are tuple-based

windows (Arasu et al., 2003b, 2006). In tuple-

based window models the number of the last N

tuples which are valid for the current window is

defined. This idea is further specialized by de-

fining partitioned windows (Arasu et al., 2003b,

2006). A partitioned window definition takes a

set of attributes from the stream's schema and a

window size N as input. The stream is, similar to

a grouping operation, partitioned into substreams

by the attribute values. The window contains the

last N records of each group.

For our traffic application we now assume, that

for each lane of the highway the traffic is monitored

separately. Hence, we introduce a new attribute

LaneNo in the MeasurementStation data stream

schema. The query in Box 2 analyzes speed and

traffic volume for each lane separately and creates

a substream for each by using the PARTITION

BY operator from CQL.

Li et al. (2005) define a sliding window by using

parameters for the size of the window (RANGE),

the sliding step (SLIDE) and the attribute over

which range and size are defined (WATTR).

Furthermore, windowing techniques can be

distinguished according to the definition of their

bounds (also called edges (Patroumpas and Sellis,

2006)). Suppose we want to track velocities on a

freeway section after a specific event happened,

for example, an accident. We want to analyze

the change in average velocity since the moment

that the accident happened, and want to monitor

it until the situation has been solved. In this case

we would fix the lower bound of the window, such

that it marks the time of the accident in the stream.

The upper bound of the window is progressing

with new data records streaming in. These kinds

of windows are called landmark windows . Spe-

cifically, landmark windows are categorized in

upper-bound and lower-bound windows, depend-

ing on the edge which is fixed (Patroumpas and

Sellis, 2006). Of course, a landmark window with

a lower bound suffers from the same problems

like any data stream. Due to the ever-growing size

of the window, the amount of resources required

for query evaluation also grows. Hence, such a

Evolving Application Domains of Data Warehousing and Mining

Search WWH ::

Custom Search

Home