Database Reference
In-Depth Information
Achieving High Freshness and Optimal Throughput in
CPU-Limited Execution of Multi-join Continuous
Queries
Abhishek Mukherji, Elke A. Rundensteiner, and Matthew O. Ward
Computer Science Department, Worcester Polytechnic Institute, Worcester MA, USA
{
mukherab,rundenst,matt
}
@wpi.edu
Abstract.
Due to high data volumes and unpredictable arrival rates, continuous
query systems processing expensive queries in
real-time
may fail to keep up with
the input data streams - resulting in buffer overflow and uncontrolled data loss.
We explore
join direction adaptation
(JDA) to tackle CPU-limited processing of
multi-join stream queries. The existing JDA solutions allocate the scarce CPU re-
sources to the most productive half-way join within a
single
operator. We instead
leverage the operator
interdependencies
to optimize the overall query through-
put. We identify
result staleness
, typically ignored by most state-of-the-art tech-
niques, as a critical issue in CPU-limited processing. It gets further aggravated
if throughput optimizing techniques are employed. We establish the novel
path-
productivity
model and the
Freshness
predicate. Our proposed
JAQPOT
approach
is the first integrated solution to achieve
near optimal
query throughput while
also guaranteeing freshness satisfiability. JAQPOT runs in quadratic time of the
number of streams irrespective of the query plan shape. Our experimental study
demonstrates the superiority of JAQPOT in achieving higher throughput than the
state-of-the-art JDA strategy while also fulfilling freshness predicates.
1
Introduction
Motivation.
Data Stream Management Systems
(DSMS) [2, 5, 15] are in high demand
for real-time decision support as they transform huge amounts of streaming data into
usable
knowledge. Due to rapid expansions in the diversity of data sources and the vol-
ume of data these sources deliver, DSMS are faced with the challenge of processing user
queries demanding
real-time
responsiveness even under conditions of unpredictability,
high and bursty data volumes.
Windowed joins across streams, while among the most common queries in DSMS
applications, are more costly compared to other operations such as selection, aggrega-
tion and projection [8, 9, 11]. When processing complex join queries, either the pro-
cessor may fail to keep up with the arrival rates of the streams (the CPU-limited case)
or the available main memory may become insufficient to hold all relevant tuples (the
memory-limited case). For queries composed of joins with large states across multiple
high-speed data streams, the system is even more prone to such resource deficiencies.
This work was supported by NSF grants IS-0812027, CCF-0811510, IIS-0917017 and IIS-
1018443.