Database Reference
In-Depth Information
Typically a streaming query of the sort shown above is converted into a
graph of operators, where data flows from input streams (“leaf nodes”) to
output operators and the user. For example, the AVG query above would consist
of three operators: an input operator that reads the incoming data stream and
packages it into tuples for processing, a select operator that filters out all non-
IBM stocks, and an aggregate operator that computes the average over the
IBM stocks.
Streaming systems include a runtime system that is responsible for execut-
ing the compiled query plan. The main component of any runtime system is
a scheduler (or scheduling policy) that dictates the order in which operators
run. There are a number of interesting scheduling tradeoffs—for example, a
simple scheduling policy is to push each tuple all the way through the query
plan before moving on to the next tuple. This is good from the standpoint of
data cache locality, as it keeps the same tuple in cache through several opera-
tors. It may be inecient, however, if there is overhead to pass a single tuple
from one operator to the next, or if all of the operators don't fit into the in-
struction cache. An alternative is to batch several tuples together and process
them as a group, as proposed in the Aurora system, 10 which reduces per-tuple
scheduling overheads and possibly improves instruction cache locality.
Traditional DSMSs are excellent for a variety of applications that need
simple pattern matching and filtration over simple data types. As we discuss
below, many scientific applications require more sophisticated processing, such
as time/frequency domain conversions, signal processing filters, convolution,
support for arrays and matries, and so on. In some cases, DSMSs are exten-
sible , meaning that they allow users to define their own types and operations
over those types, but this extension is typically done in some external language
(e.g., C or Java), which introduces several limitations, as discussed below.
11.2.2 Scientific Applications
As noted above, scientific applications often require stream processing. This
need is evident in a large number of signal-oriented streaming applications
proposed in the sensor network literature (where the predominant applica-
tions are scientific in nature), including preventive maintenance of industrial
equipment; 11 detection of fractures and ruptures in pipelines, 12 airplane wings
(http://www.metisdesign.com/shm.html), or buildings; 13 in situ animal be-
havior studies using acoustic sensing; 14 network trac analysis; 15 particle de-
tectors in physics experiments; and medical applications such as anomaly de-
tection in electrocardiogram signals. 16 Another important scientific area that
requires management of steaming data is geosciences. Examples of multiple
sources of streaming data and their integration are discussed in Chapter 10.
These target applications use a variety of embedded sensors, each sampling at
fine resolution and producing data at rates as high as hundreds of thousands
of samples per second.
Search WWH ::




Custom Search