Databases Reference
In-Depth Information
2. An S-tuple and a T -tuple agree on C.
If we join R and S first, using the map-reduce algorithm of Section 2.3.7,
then the communication cost is O(r + s), and the size of the intermediate join
R ⊲⊳ S is prs. When we join this result with T , the communication of this
second map-reduce job is O(t + prs). Thus, the entire communication cost of
the algorithm consisting of two 2-way joins is O(r + s + t + prs). If we instead
join S and T first, and then join R with the result, we get another algorithm
whose communication cost is O(r + s + t + pst).
A third way to take this join is to use a single map-reduce job that joins the
three relations at once. Suppose that we plan to use k Reduce tasks for this
job. Pick numbers b and c representing the number of buckets into which we
shall hash B- and C-values, respectively. Let h be a hash function that sends
B-values into b buckets, and let g be another hash function that sends C-values
into c buckets. We require that bc = k; that is, each Reduce task corresponds to
a pair of buckets, one for the B-value and one for the C-value. The Reduce task
corresponding to bucket pair (i, j) is responsible for joining the tuples R(u, v),
S(v, w), and T (w, x) whenever h(v) = i and g(w) = j.
As a result, the Map tasks that send tuples of R, S, and T to the Reduce
tasks that need them must send R- and T -tuples to more than one Reduce task.
For an S-tuple S(v, w), we know the B- and C-values, so we can send this tuple
only to the Reduce task
. However, consider an R-tuple R(u, v).
We know it only needs to go to Reduce tasks
h(v), g(w)
, but we don't know the
value of y. The value of C could be anything as far as we know. Thus, we
must send R(u, v) to c reduce tasks, since y could be any of the c buckets for
C-values. Similarly, we must send the T -tuple T (w, x) to each of the Reduce
tasks
h(v), y
z, g(w)
for any z. There are b such tasks.
h(T.C)=1
g(C)=
h(S.B)=2 and g(S.C)=1
0
1
2
3
0
1
h(B)=
2
h(R.B)=2
3
Figure 2.8: Sixteen Reduce tasks together perform a 3-way join
Search WWH ::




Custom Search