Database Reference
In-Depth Information
denoted as r A
C = 1. Thus, for every 15 tuples consumed by Path A in a
second, it will produce 4 t ABC joined tuples. The values in Fig. 8 may be fractional .
Once a multi-join plan has been translated into a productivity table, our join adaptation
problem can be formulated as a variant of the knapsack problem [14], as given below.
Problem 1. Join adaptation knapsack problem: Given a plan Q with k paths Path 1 ,
Pa th 2 ,...,Path k , when Path I is assigned resources in multiple M I of X I , then its through-
put r I
root =4as
σ
p (Path I ). By defining a I to be 1 if Path I is chosen in a solution and 0
otherwise, we can formulate this JDA-Knapsack problem as:
Maximize
root =M I
×ρ
k
I
1 a I
r I
root
Σ
×
=
subject to:
k
I
1 a I
M I
X I
Σ
×
×
≤μ.
=
JAQPOT Policy: Allocate
μ
iteratively to the next most productive path until
μ
gets
completely consumed.
Assume
= 30 tuples/sec. Using the JAQPOT Policy for the productivity table listed
in Fig. 8, the most productive path Path B will be assigned 16 tuples (out of 30). The
remaining 14 resources fall short of X A =15, the minimum resources required by the
second most productive path Path A . Thus, Path C will be chosen and assigned 10 re-
sources, wasting the remaining 4 resources. The total throughput thus achieved is 6
tuples/cycle (assume each cycle runs for 1 second). A more effective assignment would
be to instead give the complete 30 (=15
μ
2) tuples/cycle to Path A and achieve 8 (=4
×
2) tuples/cycle as throughput. This illustrates that a greedy application of the JAQPOT
Policy fails to achieve optimal throughput .
Above, we find ourselves working under rigid constraints. First, each execution cy-
cle runs independently of its predecessor and successor execution cycles. Second, we
assume a discrete execution model where X I ,r I
root
×
must be whole numbers .Un-
der this model the throughput optimization problem does not exhibit the greedy choice
property ([14]). Thus, a dynamic programming knapsack solver must be employed to
achieve an assignment yielding optimal throughput which runs in
and
μ
(k
×μ
), for k input
streams and
, this solver would
be extremely compute-intensive. Therefore, we now explore alternate greedy strategies
for solving this problem.
The Greedy Knapsack Solver. We now relax the above restrictions. First, instead of in-
dependent execution cycles , each being assigned distinct
μ
available computing resources. For higher values of
μ
resources, we now consider
the coordinated execution across successive cycles . For example, two successive cycles
producing 3 and 2 join tuples respectively will result in the overall path productivity
ρ p (Path I ) to be 2.5 tuples/cycle. As we will see shortly, this achieves even higher output
rates than produced under the discrete execution model. Once such a group of succes-
sive cycles is identified, we can view their combination as a mega cycle. Secondly, X I ,
r I
root
μ
values can now be fractional . Thus, for Path B
(Fig. 7), X B
and
μ
= 16 tuples/sec
and r B
root
= 5 tuples/sec may be re-phrased as Path B
using X B
= 8 tuples/sec to pro-
duce r B
root = 2.5 tuples/sec. While fractional tuples cannot be consumed (or produced)
in a single cycle, over the span of multiple successive cycles a virtual consumption (or
production) of fractional tuples per cycle may arise.
 
Search WWH ::




Custom Search