Achieving High Freshness and Optimal Throughput in CPU-Limited Execution of Multi-join Continuous Queries - Advances in Databases

Database Reference

In-Depth Information

Maximizing the output of ?

j

SELECT B.symbol, B.price

FROM stocksNYC A [WIN 10 mins], stocksTokyo B [WIN 10 mins]

WHERE A.symbol = “GOOG” AND B.Volume > A.Volume

…

……

Fig. 1. Example Query

Fig. 2. A pipeline of join operators

Gedik et al. [8] observe that with increasing stream arrival rates and large join states,

the CPU typically becomes strained before the memory does. Temporary data flushing

[11] and compressed data representations further counteract the chances of a memory-

limited scenario. If under duress complete results can no longer be produced at run-time,

then the DSMS must employ the available resources to ensure the production of max-

imal run-time throughput (output rate). Therefore, in this work, we aim at optimizing

the throughput of multi-join queries in CPU -limited cases.

When resources are limited, yet another pressing issue, namely, result staleness

arises. In Query Q1 (Fig. 1) a stock trader is interested in the companies whose stocks

got traded at Tokyo in higher volumes than Google stocks traded in NYC .Hewants

the comparable transactions to happen within 10 minutes of each other . For real-time

decision making, the DSMS may be required to produce results continuously (say, once

every minute). However, if the system faces high workloads and backlogs in process-

ing, result tuples may get delayed. For example, the trader may receive results about

transactions that took place 15 minutes before the current time. Such results, despite

satisfying the 10-minute window predicate, would be considered stale and useless by

the trader. Clearly, high throughput results with no freshness guarantees are unaccept-

able in real-time applications as they may be producing results already deemed useless .

In addition to the WINDOW predicate, the trader may want to specify a freshness

predicate to indicate his tolerance to staleness .A freshness predicate may be defined

on each stream, i.e., 12 mins for stocksNYC whereas 15 mins for stocksTokyo .Tothe

best of our knowledge, our work is the first to identify the result staleness problem in the

context of resource-limited execution of multi-join plans and tackles the dual problems

of achieving optimal throughput while satisfying freshness of the join results.

The State-of-the-art. Two directions for tackling join queries under computing limita-

tions are load shedding [4,9,16]and join direction adaptation (JDA) [8, 10]. The main

focus of load shedding is to reduce the input rates by directly dropping tuples from the

source streams [4]. This makes the plan incapable of recuperating with the production

of accurate results in moments of low workloads as data is permanently lost.

Unlike load shedding, JDA preserves in-memory tuples as per the join semantics for

opportunities of joining with future incoming tuples. Existing JDA techniques [8, 10]

exploit the asymmetry in the productivities of half-way join directions within a join

operator. However, JDA techniques have so far been explored only in the context of

a single join operator. We demonstrate in this work that new challenges arise in the

multi-join case. A detailed review of the related work is provided in Sec. 6.

Research Challenges. In general, the ability of multi-join queries to achieve high result

throughput and to maintain result freshness under heavy workloads relies on resolving

the following aspects of the problem:

Advances in Databases

Search WWH ::

Custom Search

Home