Database Reference
In-Depth Information
•
While
operator scheduling
[6, 7] tends to allocate resources at the
coarse
granular-
ity of query operators,
adaptation at a finer granularity within the operators
is
required to produce optimal throughput.
•
The existing JDA technique [10] optimizes every join operator individually. In a
pipeline of join operators (Fig. 2), an uncoordinated attempt to optimize opera-
tors
j
individually may jeopardize the real goal of optimizing the overall
query throughput. The join operators within a multi-join plan are
interdependent
,
namely, an operator depends on the output of its upstream
1
operator(s) for input.
Consideration of
operator interdependency is crucial
for a successful plan-level
join direction adaptation.
i
and
We identify
result staleness
2
as a critical issue for CPU-limited processing of
multi-join queries. The biased resource allocation by
JDA
may potentially aggra-
vate this problem.
•
Proposed Approach.
Unlike load shedding that discards data once the system is on
the verge of crashing from overload, we propose to preemptively allocate the available
CPU resources with the goal to achieve maximal throughput. We design, develop and
evaluate a
synchronized
join adaptation strategy at the plan level that tackles the
result
staleness
problem while maximizing the overall throughput of the query. We summa-
rize our contributions as follows:
1. We demonstrate that the state-of-the-art
JDA
[10] technique fails to achieve optimal
throughput for the
CPU
-limited processing of
multi-join
plans (Sec. 3).
2. We establish the
path productivity
metric as the plan-level throughput contribution
of each input stream (Sec. 4.1).
3. We formulate the query throughput maximization as a
knapsack
problem and pro-
pose a
Greedy Path Productivity-based Adaptation (GrePP)
to solve it (Sec.
4.2).
4. We identify result staleness as a key challenge under CPU-limited scenarios and
formulate the
freshness satisfiability
as a
weighted set-cover
problem (Sec. 4.3).
5. We integrated the above two strategies into the JAQPOT algorithm (Sec. 4.4). To
the best of our knowledge, this is the first solution that guarantees fulfillment of
result freshness
predicates while achieving
near optimal
query throughput. We
further note that JAQPOT achieves this effective adaptation in
quadratic time in
the number of input streams
.
6. Our experimental study (Sec. 5) demonstrates the superiority of JAQPOT over the
state-of-the-art JDA solution in a large set of tested cases.
2
Preliminaries
2.1 Background
In this paper we focus on multi-join plans composed of sliding window binary join
operators. We assume standard semantics as in CQL [3]. We use the
unit-time basis
1
Operators closer to the stream input are
upstream
and those closer to the query output are
downstream
.
2
This challenge is not identified by prior work [4, 9, 16].