Database Reference
In-Depth Information
denoted as r
A
C
= 1. Thus, for every 15 tuples consumed by Path
A
in a
second, it will produce 4 t
ABC
joined tuples. The values in Fig. 8 may be
fractional
.
Once a multi-join plan has been translated into a productivity table, our join adaptation
problem can be formulated as a variant of the knapsack problem [14], as given below.
Problem 1.
Join adaptation knapsack problem:
Given a plan Q with k paths Path
1
,
Pa th
2
,...,Path
k
, when Path
I
is assigned resources in multiple M
I
of X
I
, then its through-
put r
I
root
=4as
σ
p
(Path
I
). By defining a
I
to be 1 if Path
I
is chosen in a solution and 0
otherwise, we can formulate this
JDA-Knapsack
problem as:
Maximize
root
=M
I
×ρ
k
I
1
a
I
r
I
root
Σ
×
=
subject to:
k
I
1
a
I
M
I
X
I
Σ
×
×
≤μ.
=
JAQPOT Policy:
Allocate
μ
iteratively to the next most productive path until
μ
gets
completely consumed.
Assume
= 30 tuples/sec. Using the JAQPOT Policy for the productivity table listed
in Fig. 8, the most productive path Path
B
will be assigned 16 tuples (out of 30). The
remaining 14 resources fall short of X
A
=15, the minimum resources required by the
second most productive path Path
A
. Thus, Path
C
will be chosen and assigned 10 re-
sources, wasting the remaining 4 resources. The total throughput thus achieved is 6
tuples/cycle (assume each cycle runs for 1 second). A more effective assignment would
be to instead give the complete 30 (=15
μ
2) tuples/cycle to Path
A
and achieve 8 (=4
×
2) tuples/cycle as throughput. This illustrates that
a
greedy
application of the JAQPOT
Policy fails to achieve optimal throughput
.
Above, we find ourselves working under rigid constraints. First, each execution cy-
cle runs
independently
of its predecessor and successor execution cycles. Second, we
assume a
discrete execution model
where X
I
,r
I
root
×
must be
whole numbers
.Un-
der this model the
throughput optimization
problem does not exhibit the
greedy choice
property
([14]). Thus, a
dynamic programming
knapsack solver must be employed to
achieve an assignment yielding optimal throughput which runs in
and
μ
(k
×μ
), for k input
streams and
, this solver would
be extremely compute-intensive. Therefore, we now explore alternate greedy strategies
for solving this problem.
The Greedy Knapsack Solver.
We now relax the above restrictions. First, instead of
in-
dependent execution cycles
, each being assigned
distinct
μ
available computing resources. For higher values of
μ
resources, we now consider
the
coordinated execution across successive cycles
. For example, two successive cycles
producing 3 and 2 join tuples respectively will result in the overall path productivity
ρ
p
(Path
I
) to be 2.5 tuples/cycle. As we will see shortly, this achieves even higher output
rates than produced under the
discrete
execution model. Once such a group of succes-
sive cycles is identified, we can view their combination as a
mega
cycle. Secondly, X
I
,
r
I
root
μ
values can now be
fractional
. Thus, for Path
B
(Fig. 7), X
B
and
μ
= 16 tuples/sec
and r
B
root
= 5 tuples/sec may be re-phrased as Path
B
using X
B
= 8 tuples/sec to pro-
duce r
B
root
= 2.5 tuples/sec. While
fractional
tuples cannot be consumed (or produced)
in a single cycle, over the span of multiple successive cycles a virtual consumption (or
production) of
fractional
tuples per cycle may arise.