Database Reference
In-Depth Information
a very ecient scheduler and memory manager that provide much better
throughput than existing stream processing systems.
11.3 Parallelizing High-Volume Scientific Stream Queries
WaveScope provides a complete functional programming language for specify-
ing high-volume stream processing computations. The nodes involved in these
computations can communicate using stream communication primitives where
the user explicitly specifies data interchange between WaveScope nodes. The
purpose of the systems described in this section is to provide primitives to
specify massively parallel and distributed computations in a functional query
language. The two systems GSDM (Grid Stream Data Manager) 19 and SCSQ
(Super Computer Stream Query processor) 21
provide two different ways for
parallelizing queries:
GSDM provides a library of constructors of high-level data flow distri-
bution templates to specify parallel execution schemes for functions used
in declarative stream queries. GSDM has been applied on signal analysis
in space physics applications.
SCSQ provides declarative parallelization in queries by providing stream
processes (SPs) as first-class objects in the query language. SCSQ has
been applied on space physics and trac applications.
Both GSDM and SCSQ are based on a functional data model 9 where declar-
ative queries over streams are expressed in terms of functions.
The motivating application is LOFAR, 33 which is a radio telescope in con-
struction that uses an array of 25,000 omni-directional antenna receivers
whose signals are digitized into data streams of very high rate. The LOFAR
antenna array will be the largest sensor network in the world. The receivers
produce raw data streams that arrive at the central processing facilities at a
rate that is too high for the data to be saved on disk. For these data-intensive
computations, LOFAR utilizes an IBM BlueGene supercomputer combined
with conventional Linux clusters.
High-performance stream processing for this kind of application requires the
ability to specify parallel continuous queries (CQs) running on nodes in a het-
erogeneous hardware environment. To maximize throughput of streams and
computations it is important to parallelize CQs into continuous subqueries,
each executing as a separate process on some CPU. Often the parallelization
method depends on properties of the computation executed by the query,
making it impossible to automatically parallelize the execution. The query
processing system must therefore provide primitives for customized paral-
lelization of continuous computations.
Search WWH ::




Custom Search