Database Reference
In-Depth Information
Maximizing the output of ?
j
SELECT B.symbol, B.price
FROM stocksNYC A [WIN 10 mins], stocksTokyo B [WIN 10 mins]
WHERE A.symbol = “GOOG” AND B.Volume > A.Volume
……
Fig. 1. Example Query
Fig. 2. A pipeline of join operators
Gedik et al. [8] observe that with increasing stream arrival rates and large join states,
the CPU typically becomes strained before the memory does. Temporary data flushing
[11] and compressed data representations further counteract the chances of a memory-
limited scenario. If under duress complete results can no longer be produced at run-time,
then the DSMS must employ the available resources to ensure the production of max-
imal run-time throughput (output rate). Therefore, in this work, we aim at optimizing
the throughput of multi-join queries in CPU -limited cases.
When resources are limited, yet another pressing issue, namely, result staleness
arises. In Query Q1 (Fig. 1) a stock trader is interested in the companies whose stocks
got traded at Tokyo in higher volumes than Google stocks traded in NYC .Hewants
the comparable transactions to happen within 10 minutes of each other . For real-time
decision making, the DSMS may be required to produce results continuously (say, once
every minute). However, if the system faces high workloads and backlogs in process-
ing, result tuples may get delayed. For example, the trader may receive results about
transactions that took place 15 minutes before the current time. Such results, despite
satisfying the 10-minute window predicate, would be considered stale and useless by
the trader. Clearly, high throughput results with no freshness guarantees are unaccept-
able in real-time applications as they may be producing results already deemed useless .
In addition to the WINDOW predicate, the trader may want to specify a freshness
predicate to indicate his tolerance to staleness .A freshness predicate may be defined
on each stream, i.e., 12 mins for stocksNYC whereas 15 mins for stocksTokyo .Tothe
best of our knowledge, our work is the first to identify the result staleness problem in the
context of resource-limited execution of multi-join plans and tackles the dual problems
of achieving optimal throughput while satisfying freshness of the join results.
The State-of-the-art. Two directions for tackling join queries under computing limita-
tions are load shedding [4,9,16]and join direction adaptation (JDA) [8, 10]. The main
focus of load shedding is to reduce the input rates by directly dropping tuples from the
source streams [4]. This makes the plan incapable of recuperating with the production
of accurate results in moments of low workloads as data is permanently lost.
Unlike load shedding, JDA preserves in-memory tuples as per the join semantics for
opportunities of joining with future incoming tuples. Existing JDA techniques [8, 10]
exploit the asymmetry in the productivities of half-way join directions within a join
operator. However, JDA techniques have so far been explored only in the context of
a single join operator. We demonstrate in this work that new challenges arise in the
multi-join case. A detailed review of the related work is provided in Sec. 6.
Research Challenges. In general, the ability of multi-join queries to achieve high result
throughput and to maintain result freshness under heavy workloads relies on resolving
the following aspects of the problem:
 
Search WWH ::




Custom Search