Database Reference
In-Depth Information
11.3.1 Parallel Stream Queries through Data Flow
Distribution Templates
In GSDM, continuous queries are expressed as functions over stream windows
called stream query functions (SQFs) specified using a query language. The
data flow distribution templates are parameterized descriptions of CQs as
distributed compositions of other SQFs together with a logical site assignment
for each subquery. For extensibility, a distribution template may be defined
in terms of other templates.
For scalable execution of CQs containing expensive SQFs a generic tem-
plate called PCC is provided for customizable data partitioning parallelism
(Figure 11.4). The generic template contains three phases: partition, compute,
and combine . In the partition phase the stream is split into substreams; in the
compute phase subqueries are applied in parallel on each substream; and in the
combine phase the results of the computations are combined into one stream.
The generic distribution template has been used to define two different,
more specialized stream partitioning strategies: query-dependent window split
(WS) and query-independent window distribute (WD). Window split provides
application-dependent partition and combine strategies, while window dis-
tribute is applicable on any SQF. Window split is favorable, for example, for
many numerical algorithms on vectors, such as FFT (Fast Fourier Transform),
which scale through user-defined vector partitioning; window distribute pro-
vides SQF-independent rerouting of substreams. Both strategies use a pair of
nonblocking and order-preserving SQFs to specify the partition and combine
phases.
The partition phase in window split is defined by another template,
operator-dependent stream split (OS-Split) to perform application-dependent
splitting of logical windows into smaller ones. An SQF, operator-dependent
stream join (OS-Join), implements the combine phase. Window split is partic-
ularly useful when scaling the logical window size for an SQF with complexity
higher than O(n) over the window size. For example, space physics and many
signal processing applications require the FFT to be applied on large vector
Compute
S2
S1
Partition
Compute
Combine
Compute
Figure 11.4 The generic dataflow distribution template PCC for partition-
ing parallelism of expensive stream query functions.
Search WWH ::




Custom Search