Achieving High Freshness and Optimal Throughput in CPU-Limited Execution of Multi-join Continuous Queries - Advances in Databases

Database Reference

In-Depth Information

Achieving High Freshness and Optimal Throughput in

CPU-Limited Execution of Multi-join Continuous

Queries

Abhishek Mukherji, Elke A. Rundensteiner, and Matthew O. Ward

Computer Science Department, Worcester Polytechnic Institute, Worcester MA, USA

{ mukherab,rundenst,matt } @wpi.edu

Abstract. Due to high data volumes and unpredictable arrival rates, continuous

query systems processing expensive queries in real-time may fail to keep up with

the input data streams - resulting in buffer overflow and uncontrolled data loss.

We explore join direction adaptation (JDA) to tackle CPU-limited processing of

multi-join stream queries. The existing JDA solutions allocate the scarce CPU re-

sources to the most productive half-way join within a single operator. We instead

leverage the operator interdependencies to optimize the overall query through-

put. We identify result staleness , typically ignored by most state-of-the-art tech-

niques, as a critical issue in CPU-limited processing. It gets further aggravated

if throughput optimizing techniques are employed. We establish the novel path-

productivity model and the Freshness predicate. Our proposed JAQPOT approach

is the first integrated solution to achieve near optimal query throughput while

also guaranteeing freshness satisfiability. JAQPOT runs in quadratic time of the

number of streams irrespective of the query plan shape. Our experimental study

demonstrates the superiority of JAQPOT in achieving higher throughput than the

state-of-the-art JDA strategy while also fulfilling freshness predicates.

1

Introduction

Motivation. Data Stream Management Systems (DSMS) [2, 5, 15] are in high demand

for real-time decision support as they transform huge amounts of streaming data into

usable knowledge. Due to rapid expansions in the diversity of data sources and the vol-

ume of data these sources deliver, DSMS are faced with the challenge of processing user

queries demanding real-time responsiveness even under conditions of unpredictability,

high and bursty data volumes.

Windowed joins across streams, while among the most common queries in DSMS

applications, are more costly compared to other operations such as selection, aggrega-

tion and projection [8, 9, 11]. When processing complex join queries, either the pro-

cessor may fail to keep up with the arrival rates of the streams (the CPU-limited case)

or the available main memory may become insufficient to hold all relevant tuples (the

memory-limited case). For queries composed of joins with large states across multiple

high-speed data streams, the system is even more prone to such resource deficiencies.

This work was supported by NSF grants IS-0812027, CCF-0811510, IIS-0917017 and IIS-

1018443.

Search WWH ::

Custom Search

Home