Database Reference
In-Depth Information
4
The Proposed JAQPOT Approach
4.1 Optimizing Throughput in Multi-join Queries
Throughput optimization in a
multi-join
plan requires
producer-consumer matches
between every successive join pair where all intermediate tuples produced by a
producer
join must get consumed by the downstream
consumer
join for probing the partner join
state. Based on this insight, we propose a synchronized plan level resource allocation
strategy.
Input Paths.
We introduce the notion of
input paths
in Def. 3.
Definition 3.
Given a multi-join plan Q with k input streams
5
(I = 1, ..., k); each
pipeline of half-way joins from the leaf to the root operator forms an
input path
denoted
by Path
I
. A path having
n
join operators between input stream I and the output of query
is called an
n-hop path
.
In our example plan (Fig. 4.b), we identify three input paths, namely, Path
A
,Path
B
and
Path
C
.Path
A
, a 2-hop path, is composed of two sequential half-way joins, namely, (a
S
B
) followed by (ab
S
C
). Along an
n-hop
Path
I
,every
input tuple joins and propagates through
n
half-way joins upto the root. Similar to the
half-way join productivity (
S
C
), also written as (a
S
B
ρ
h
), we now define the
cardinality of intermediate joined
I
j
, produced by the j
th
half-way join along Path
I
.AsinEq.5,
I
j
tuples
, denoted as
φ
φ
is
ρ
h
) along Path
I
computed by multiplying the productivities (
upto j. Here, superscript
p
I
j
denotes the
partner
join state.
φ
forms an important component of the core formulae
that we define next.
I
I
I
I
X
1
X
2
…
X
j
…
X
n
I
j
=
g
=
1
(
X
p
g
S
g
|
φ
I
σ
×|
)
.
(5)
…
…
…
…
Fig. 6.
Division of X
I
resources within Path
I
X
I
I
j
×φ
−
1
X
j
=
X
I
1
×φ
I
j
.
(6)
1
=
(
)
−
I
I
n
1
+φ
1
+...+φ
−
1
Division of Resources within an Input Path.
X
I
resources are allocated
to an
n-hop
path Path
I
. Figure 6 depicts the division of X
I
among the half-way joins
of Path
I
as
probe allowances
X
I
1
,X
I
2
, ..., X
j
, ..., X
I
n
. For an effective division of X
I
producer-consumer matches
must be established between every pair of successive
half-way joins, such that the output of a producer equals the
probe allowance
of the
consumer join. Each such
probe allowance
for the j
th
of the total
μ
half-way join, denoted as X
j
,is
expressed in terms of X
I
I
j
as in Eq. 6. Here, by Eq. 5,
φ
denotes the
cardinality of the
−
1
(j-1)
th
intermediate joined tuples
.
Path Productivity.
We now establish a
novel
metric that
measures the contribution
of an input path to the overall query throughput (Def. 4)
.
5
If stream I is used multiple times as input to the plan (self-joins), then separate copies of I will
used as separate inputs.