Database Reference
In-Depth Information
EXAMPLE 2.10 Let us see under what circumstances the 3-way join has lower communica-
tion cost than the cascade of two 2-way joins. To make matters simple, let us assume that R ,
S , and T are all the same relation R , which represents the “friends” relation in a social net-
work like Facebook. There are roughly a billion subscribers on Facebook, with an average
of 300 friends each, so relation R has r = 3 × 10 11 tuples. Suppose we want to compute R
R R , perhaps as part of a calculation to find the number of friends of friends of friends
each subscriber has, or perhaps just the person with the largest number of friends of friends
of friends. 8 The cost of the 3-way join of R with itself is
represents the cost of the
is the cost of the Reduce tasks. Since we assume r = 3 × 10 11 , this cost
Map tasks, and
is
Now consider the communication cost of joining R with itself, and then joining the result
with R again. The Map and Reduce tasks for the first join each have a cost of 2 r , so the first
join only has communication cost 4 r = 1 . 2 × 10 12 . But the size of R R is large. We cannot
say exactly how large, since friends tend to fall into cliques, and therefore a person with
300 friends will have many fewer than the maximum possible number of friends of friends,
which is 90,000. Let us estimate conservatively that the size of R R is not 300 r , but only
30 r , or 9 × 10 12 . The communication cost for the second join of ( R R ) R is thus 1 . 8 ×
10 13 + 6 × 10 11 . The total cost of the two joins is therefore 1 . 2 × 10 12 + 1 . 8 × 10 13 + 6 ×
10 11 = 1 . 98 × 10 13 .
We must ask whether the cost of the 3-way join, which is
is less than 1 . 98 × 10 13 . That is so, provided or That is, the 3-way join will
be preferable provided we use no more than 31 2 = 961 reducers.
2.5.4
Exercises for Section 2.5
EXERCISE 2.5.1 What is the communication cost of each of the following algorithms, as a
function of the size of the relations, matrices, or vectors to which they are applied?
(a) The matrix-vector multiplication algorithm of Section 2.3.2 .
(b) The union algorithm of Section 2.3.6 .
(c) The aggregation algorithm of Section 2.3.8 .
(d) The matrix-multiplication algorithm of Section 2.3.10 .
Search WWH ::




Custom Search