Database Reference
In-Depth Information
EXAMPLE
2.10 Let us see under what circumstances the 3-way join has lower communica-
tion cost than the cascade of two 2-way joins. To make matters simple, let us assume that
R
,
S
, and
T
are all the same relation
R
, which represents the “friends” relation in a social net-
work like Facebook. There are roughly a billion subscribers on Facebook, with an average
of 300 friends each, so relation
R
has
r
= 3 × 10
11
tuples. Suppose we want to compute
R
⋈
R
⋈
R
, perhaps as part of a calculation to find the number of friends of friends of friends
each subscriber has, or perhaps just the person with the largest number of friends of friends
represents the cost of the
is the cost of the Reduce tasks. Since we assume
r
= 3 × 10
11
, this cost
Map tasks, and
is
Now consider the communication cost of joining
R
with itself, and then joining the result
with
R
again. The Map and Reduce tasks for the first join each have a cost of 2
r
, so the first
join only has communication cost 4
r
= 1
.
2 × 10
12
. But the size of
R
⋈
R
is large. We cannot
say exactly how large, since friends tend to fall into cliques, and therefore a person with
300 friends will have many fewer than the maximum possible number of friends of friends,
which is 90,000. Let us estimate conservatively that the size of
R
⋈
R
is not 300
r
, but only
30
r
, or 9 × 10
12
. The communication cost for the second join of (
R
⋈
R
)
⋈
R
is thus 1
.
8 ×
10
13
+ 6 × 10
11
. The total cost of the two joins is therefore 1
.
2 × 10
12
+ 1
.
8 × 10
13
+ 6 ×
10
11
= 1
.
98 × 10
13
.
We must ask whether the cost of the 3-way join, which is
is less than 1
.
98 × 10
13
. That is so, provided or That is, the 3-way join will
be preferable provided we use no more than 31
2
= 961 reducers.
□
2.5.4
Exercises for Section 2.5
EXERCISE
2.5.1 What is the communication cost of each of the following algorithms, as a
function of the size of the relations, matrices, or vectors to which they are applied?
(a) The matrix-vector multiplication algorithm of
Section 2.3.2
.
(b) The union algorithm of
Section 2.3.6
.
(c) The aggregation algorithm of
Section 2.3.8
.
(d) The matrix-multiplication algorithm of
Section 2.3.10
.