Analyzing Data Streams in Scientific Applications - Scientific Data Management

Database Reference

In-Depth Information

Typically a streaming query of the sort shown above is converted into a

graph of operators, where data flows from input streams (“leaf nodes”) to

output operators and the user. For example, the AVG query above would consist

of three operators: an input operator that reads the incoming data stream and

packages it into tuples for processing, a select operator that filters out all non-

IBM stocks, and an aggregate operator that computes the average over the

IBM stocks.

Streaming systems include a runtime system that is responsible for execut-

ing the compiled query plan. The main component of any runtime system is

a scheduler (or scheduling policy) that dictates the order in which operators

run. There are a number of interesting scheduling tradeoffs—for example, a

simple scheduling policy is to push each tuple all the way through the query

plan before moving on to the next tuple. This is good from the standpoint of

data cache locality, as it keeps the same tuple in cache through several opera-

tors. It may be inecient, however, if there is overhead to pass a single tuple

from one operator to the next, or if all of the operators don't fit into the in-

struction cache. An alternative is to batch several tuples together and process

them as a group, as proposed in the Aurora system, 10 which reduces per-tuple

scheduling overheads and possibly improves instruction cache locality.

Traditional DSMSs are excellent for a variety of applications that need

simple pattern matching and filtration over simple data types. As we discuss

below, many scientific applications require more sophisticated processing, such

as time/frequency domain conversions, signal processing filters, convolution,

support for arrays and matries, and so on. In some cases, DSMSs are exten-

sible , meaning that they allow users to define their own types and operations

over those types, but this extension is typically done in some external language

(e.g., C or Java), which introduces several limitations, as discussed below.

11.2.2 Scientific Applications

As noted above, scientific applications often require stream processing. This

need is evident in a large number of signal-oriented streaming applications

proposed in the sensor network literature (where the predominant applica-

tions are scientific in nature), including preventive maintenance of industrial

equipment; 11 detection of fractures and ruptures in pipelines, 12 airplane wings

(http://www.metisdesign.com/shm.html), or buildings; 13 in situ animal be-

havior studies using acoustic sensing; 14 network trac analysis; 15 particle de-

tectors in physics experiments; and medical applications such as anomaly de-

tection in electrocardiogram signals. 16 Another important scientific area that

requires management of steaming data is geosciences. Examples of multiple

sources of streaming data and their integration are discussed in Chapter 10.

These target applications use a variety of embedded sensors, each sampling at

fine resolution and producing data at rates as high as hundreds of thousands

of samples per second.

Search WWH ::

Custom Search

Home