Database Reference
In-Depth Information
Achieving High Freshness and Optimal Throughput in
CPU-Limited Execution of Multi-join Continuous
Queries
Abhishek Mukherji, Elke A. Rundensteiner, and Matthew O. Ward
Computer Science Department, Worcester Polytechnic Institute, Worcester MA, USA
{ mukherab,rundenst,matt } @wpi.edu
Abstract. Due to high data volumes and unpredictable arrival rates, continuous
query systems processing expensive queries in real-time may fail to keep up with
the input data streams - resulting in buffer overflow and uncontrolled data loss.
We explore join direction adaptation (JDA) to tackle CPU-limited processing of
multi-join stream queries. The existing JDA solutions allocate the scarce CPU re-
sources to the most productive half-way join within a single operator. We instead
leverage the operator interdependencies to optimize the overall query through-
put. We identify result staleness , typically ignored by most state-of-the-art tech-
niques, as a critical issue in CPU-limited processing. It gets further aggravated
if throughput optimizing techniques are employed. We establish the novel path-
productivity model and the Freshness predicate. Our proposed JAQPOT approach
is the first integrated solution to achieve near optimal query throughput while
also guaranteeing freshness satisfiability. JAQPOT runs in quadratic time of the
number of streams irrespective of the query plan shape. Our experimental study
demonstrates the superiority of JAQPOT in achieving higher throughput than the
state-of-the-art JDA strategy while also fulfilling freshness predicates.
1
Introduction
Motivation. Data Stream Management Systems (DSMS) [2, 5, 15] are in high demand
for real-time decision support as they transform huge amounts of streaming data into
usable knowledge. Due to rapid expansions in the diversity of data sources and the vol-
ume of data these sources deliver, DSMS are faced with the challenge of processing user
queries demanding real-time responsiveness even under conditions of unpredictability,
high and bursty data volumes.
Windowed joins across streams, while among the most common queries in DSMS
applications, are more costly compared to other operations such as selection, aggrega-
tion and projection [8, 9, 11]. When processing complex join queries, either the pro-
cessor may fail to keep up with the arrival rates of the streams (the CPU-limited case)
or the available main memory may become insufficient to hold all relevant tuples (the
memory-limited case). For queries composed of joins with large states across multiple
high-speed data streams, the system is even more prone to such resource deficiencies.
This work was supported by NSF grants IS-0812027, CCF-0811510, IIS-0917017 and IIS-
1018443.
 
Search WWH ::




Custom Search