Database Reference
In-Depth Information
tasks [19]. The ubiquity of mobile devices, location services, sensor pervasiveness
and real-time network monitoring have created the crucial need for building scalable
and parallel architectures to process vast amounts of streamed data.
In general, stream processing systems support a large class of applications in
which data are generated from multiple sources and are pushed asynchronously to
servers that are responsible for processing. Therefore, stream processing applications
are usually deployed as continuous jobs that run from the time of their submission
until their cancellation. Many applications in several domains such as telecommuni-
cations, network security, and large-scale sensor networks require online processing
of continuous data flows. They produce very high loads that require aggregating the
processing capacity of many nodes. Rather than processing stored data like in tradi-
tional database systems, stream processing engines process tuples on-the-fly. This is
due to the amount of input that discourages persistent storage and the requirement of
providing prompt results. Queries of streaming application are generally continuous
and stateful. Once a query is registered, it starts processing events and only stops
when the system terminates or the query is deregistered from the system. Queries
typically maintain state such as aggregates of windows or local variables. Query
state is kept on the same node that executes the query.
In the last decade, there have been substantial advancements in the field of data
stream processing. From centralized stream processing systems, the state-of-the-art
has advanced to stream processing engines with the ability to distribute different
queries among a cluster of nodes [10,11,18]. This chapter provides an overview of a
set of the main systems that have been presented for achieving scalable processing
of streaming data.
12.2 AUROR A
The Aurora [2,7] is a centralized stream processor that is fundamentally presented
as a data-flow system and uses the popular boxes and arrows paradigm. In aurora, a
stream is modeled as an append-only sequence of tuples with uniform type (schema).
In addition to application-specific data fields A 1 ,..., A n , each tuple in a stream has
a timestamp (ts) that specifies its time of origin within the Aurora network. The
Aurora data model supports out-of-order data arrival. Tuples flow through a loop-
free, directed graph of processing operators (i.e., boxes). Ultimately, output streams
are presented to applications, which must be constructed to handle the asynchro-
nously arriving tuples in an output stream. Each operator accepts input streams,
transforms them in some way, and produces one or more output streams. By default,
queries are continuous in that they can potentially run forever over push-based
inputs. FigureĀ 12.1 illustrates an overview of the Aurora system.
The Aurora Stream Query Algebra (SQuAl) supports seven operators that are
used to construct Aurora networks queries. The operators are analogous to operators
in the relational algebra. However, they differ in fundamental ways in the way they
address the special requirements of stream processing. They can be divided into two
main sections: (1) order-agnostic operators (filter, map, and union) and (2) order-
sensitive operators (BSort, Aggregate, Join, and Resample). The behavior of these
operators are described as follows:
Search WWH ::




Custom Search