Database Reference
In-Depth Information
CPUs, communication mechanisms, and operating systems substantially influ-
ence query execution performance. These properties are stored in a database,
which is used by the query optimizer when assigning an SP to a CPU.
For example, the distributed grep MapReduce 23 query using 1,000 parallel
grep calls is specified in SCSQL as follows:
merge(spv(select grep("pattern", filename(i))
from Integer i
where i in iota(1,1000)));
Notice, however, that MapReduce is limited to oine processing of stored data
with only one possible communication pattern, namely, map and reduce. By
contrast, SCSQ enables online processing of streams. Furthermore, arbitrary
communication patterns can be expressed using SCSQL. The example above
shows that MapReduce processing also can be easily expressed with SCSQL.
To enable easy handling of sets of parallel stream processes, the function
spv(s) assigns each continuous subquery in the set of subqueries s to a new
stream process on some compute node, and returns a set of handles to the
assigned stream processes. The function merge(p) requests elements from each
stream process in p . merge() and terminates when (if ever) the last stream
process in p terminates.
Splitting of streams is specified by referencing common variables bound
to stream processes, as illustrated by the following query function, which
implements the Radix2 parallelization of FFT for a stream source named ss .
create function radix2(String ss)-> Stream
as select radixcombine(merge({a,b}))
from SP a, SP b, SP c
where a=sp(fft(odd (extract(c))))
and b=sp(fft(even(extract(c))))
and c=sp(receiver(ss));
The receiver() function returns a stream of 1D arrays of signal data.
odd(x) and even(x) obtain odd and even elements from array x , respectively.
radixcombine() combines the results from the partial FFT algorithms work-
ing in parallel.
The output of an SP is sent to one or more other SPs, which are called
subscribers of that SP. The user can control which tuples are sent to which
subscriber using a postfilter. 24 The postfilter is expressed in SCSQL, and can
be any function that operates on the output stream of its SP. For each output
tuple from an SP, the postfilter is called once per subscriber. Hence, the
postfilter can transform and filter the output of an SP to determine whether
a tuple should be sent to a subscriber.
In the example query above, all elements from c are sent to both a and b .
a and b apply the odd() and even() filter functions to extract odd and even
elements of the vectors from c . Obviously, the amount of communication from
c to a and b can be reduced by 50% if a postfilter is applied in c before its
Search WWH ::




Custom Search